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Preface to the Second Edition 



This second edition represents a revised and extended version and offers an improved 
description besides new issues and extended references. The contents of this book are 
the basis of a lecture on Digital Audio Signal Processing at the Hamburg University of 
Technology (TU Hamburg-Harburg) and a lecture on Multimedia Signal Processing at the 
Helmut Schmidt University, Hamburg. For further studies you can find interactive audio 
demonstrations, exercises and Matlab examples on the web site 

http : / / ant . hsu-hh . de/ dasp/ 

Besides the basics of digital audio signal processing introduced in this second edition, 
further advanced algorithms for digital audio effects can be found in the book DAFX - 
Digital Audio Effects (Ed. U. Zolzer) with the related web site 

http : / / www . daf x . de 

My thanks go to Professor Dieter Leckschat, Dr. Gerald Schuller, Udo Ahlvers, 
Mijail Guillemard, Christian Helmrich, Martin Holters, Dr. Florian Keiler, Stephan Moller, 
Francois-Xavier Nsabimana, Christian Ruwwe, Harald Schorr, Dr. Oomke Weikert, 
Catja Wilkens and Christian Zimmermann. 

Udo Zolzer 
Hamburg, June 2008 




Preface to the First Edition 



Digital audio signal processing is employed in recording and storing music and speech 
signals, for sound mixing and production of digital programs, in digital transmission to 
broadcast receivers as well as in consumer products like CDs, DATs and PCs. In the latter 
case, the audio signal is in a digital form all the way from the microphone right up to the 
loudspeakers, enabling real-time processing with fast digital signal processors. 

This book provides the basis of an advanced course in Digital Audio Signal Processing 
which I have been giving since 1992 at the Technical University Hamburg-Harburg. 
It is directed at students studying engineering, computer science and physics and also 
for professionals looking for solutions to problems in audio signal processing like in 
the fields of studio engineering, consumer electronics and multimedia. The mathematical 
and theoretical fundamentals of digital audio signal processing systems will be presented 
and typical applications with an emphasis on realization aspects will be discussed. Prior 
knowledge of systems theory, digital signal processing and multirate signal processing is 
taken as a prerequisite. 

The book is divided into two parts. The first part (Chapters 1-4) presents a basis 
for hardware systems used in digital audio signal processing. The second part (Chapters 
5-9) discusses algorithms for processing digital audio signals. Chapter 1 describes the 
course taken by an audio signal from its recording in a studio up to its reproduction 
at home. Chapter 2 contains a representation of signal quantization, dither techniques 
and spectral shaping of quantization errors used for reducing the nonlinear effects of 
quantization. In the end, a comparison is made between the fixed-point and floating- 
point number representations as well as their associated effects on format conversion and 
algorithms. Chapter 3 describes methods for AD/DA conversion of signals, starting with 
Nyquist sampling, methods for oversampling techniques and delta-sigma modulation. The 
chapter closes with a presentation of some circuit design of AD/DA converters. After an 
introduction to digital signal processors and digital audio interfaces, Chapter 4 describes 
simple hardware systems based on a single- and multiprocessor solutions. The algorithms 
introduced in the following Chapters 5-9 are, to a great extent, implemented in real-time 
on hardware platforms presented in Chapter 4. Chapter 5 describes digital audio equalizers. 
Apart from the implementation aspects of recursive audio filters, nonrecursive linear phase 
filters based on fast convolution and filter banks are introduced. Filter designs, parametric 
filter structures and precautions for reducing quantization errors in recursive filters are dealt 
with in detail. Chapter 6 deals with room simulation. Methods for simulation of artificial 
room impulse response and methods for approximation of measured impulse responses 
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xii 

are discussed. In Chapter 7 the dynamic range control of audio signals is described. 
These methods are applied at several positions in the audio chain from the microphone 
up to the loudspeakers in order to adapt to the dynamics of the recording, transmission 
and listening environment. Chapter 8 contains a presentation of methods for synchronous 
and asynchronous sampling rate conversion. Efficient algorithms are described which are 
suitable for real-time processing as well as off-line processing. Both lossless and lossy 
audio coding are discussed in Chapter 9. Lossless audio coding is applied for storing of 
higher word-lengths. Lossy audio coding, on the other hand, plays a significant role in 
communication systems. 

I would like to thank Prof. Fliege (University of Mannheim), Prof. Kammeyer 
(University of Bremen) and Prof. Heute (University of Kiel) for comments and support. 
I am also grateful to my colleagues at the TUHH and especially Dr. Alfred Mertins, 
Dr. Thomas Boltze, Dr. Bernd Redmer, Dr. Martin Schonle, Dr. Manfred Schusdziarra, 
Dr. Tanja Karp, Georg Dickmann, Werner Eckel, Thomas Scholz, Rudiger Wolf, Jens 
Wohlers, Horst Zolzer, Barbel Erdmann, Ursula Seifert and Dieter Godecke. Apart from 
these, I would also like to say a word of gratitude to all those students who helped me in 
carrying out this work successfully. 

Special thanks go to Saeed Khawaja for his help during translation and to Dr. Anthony 
Macgrath for proof-reading the text. I also would like to thank Jenny Smith, Colin 
McKerracher, Ian Stoneham and Christian Rauscher (Wiley). 

My special thanks are directed to my wife Elke and my daughter Franziska. 



Udo Zolzer 
Hamburg, July 1997 




Chapter 1 

Introduction 



It is hardly possible to make a start in the field of digital audio signal processing without 
having a first insight into the variety of technical devices and systems of audio technology. 
In this introductory chapter, the fields of application for digital audio signal processing are 
presented. Starting from recording in a studio or in a concert hall, the whole chain of signal 
processing is shown, up to the reproduction at home or in a car (see Fig. 1.1). The fields of 
application can be divided into the following areas: 

• studio technology; 

• digital transmission systems; 

• storage media; 

• audio components for home entertainment. 

The basic principles of the above-mentioned fields of application will be presented as an 
overview in order to exhibit the uses of digital signal processing. Special technical devices 
and systems are outside the focus of this chapter. These devices and systems are strongly 
driven by the development of the computer technology with yearly changes and new 
devices based on new technologies. The goal of this introduction is a trend-independent 
presentation of the entire processing chain from the instrument or singer to the listener and 
consumer of music. The presentation of signal processing techniques and their algorithms 
will be discussed in the following chapters. 



1.1 Studio Technology 

While recording speech or music in a studio or in a concert hall, the analog signal from a 
microphone is first digitized, fed to a digital mixing console and then stored on a digital 
storage medium. A digital sound studio is shown in Fig. 1.2. Besides the analog sources 
(microphones), digital sources are fed to the digital mixing console over multichannel 
MADI interfaces [AES91]. Digital storage media like digital multitrack tape machines 
have been replaced by digital hard disc recording systems which are also connected via 
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Figure 1.1 Signal processing for recording, storage, transmission and reproduction. 



multichannel MADI interfaces to the mixing console. The final stereo mix is stored via a 
two-channel AES/EBU interface [AES92] on a two-channel MASTER machine. External 
appliances for effects or room simulators are also connected to the mixing console via 
a two-channel AES/EBU interface. All systems are synchronized by a MASTER clock 
reference. In digital audio technology, the sampling rates 1 fs — 48 kHz for professional 
studio technology, fs =44.1 kHz for compact disc and fs = 32 kHz for broadcasting 
applications are established. In addition, multiples of these sampling frequencies such as 
88.2, 96, 176.4, and 192 kHz are used. The sound mixing console plays a central role 
in a digital sound studio. Figure 1.3 shows the functional units. The N input signals are 
processed individually. After level and panorama control, all signals are summed up to 
give a stereo mix. The summation is carried out several times so that other auxiliary stereo 
and/or mono signals are available for other purposes. In a sound channel (see Fig. 1.4), an 
equalizer unit (EQ), a dynamic unit (DYN), a delay unit (DEL), a gain element (GAIN) 
and a panorama element (PAN) are used. In addition to input and output signals in an audio 
channel, inserts as well as auxiliary or direct outputs are required. 



1.2 Digital Transmission Systems 

In this section digital transmission will be briefly explained. Besides the analog wireless 
broadcasting systems based on amplitude and frequency modulation, DAB 2 (Digital Audio 
Broadcasting) has been introduced in several countries [HoeOl]. On the other hand, the 
internet has pushed audio/video distribution, internet radio and video via cable networks. 

Terrestrial Digital Broadcasting (DAB) 

With the introduction of terrestrial digital broadcasting, the quality standards of a compact 
disc will be achieved for mobile and stationary reception of radio signals [Ple9 1 ]. 
Therefore, the data rate of a two-channel AES/EBU signal from a transmitting studio 
is reduced with the help of a source coder [Bra94] (see Fig. 1.5). Following the source 
coder (SC), additional information ( AI) like the type of program (music/speech) and 

'Data rate 16 bit x 48 kHz = 768 kbit/s; data rate (AES/EBU signal) 2 x (24 + 8) bit x 48 kHz = 
3.072 Mbit/s; data rate (MADI signal) 56 x (24 + 8) bit x 48 kHz = 86.016 Mbit/s. 

-http://www.worlddab.org/. 
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Figure 1.3 fV-channel sound mixing console. 



traffic information is added. A multicarrier technique is applied for digital transmission 
to stationary and mobile receivers. At the transmitter, several broadcasting programs are 
combined in a multiplexer (MUX) to form a multiplex signal. The channel coding and 
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modulation is carried out by a multi-carrier transmission technique (Coded Orthogonal 
frequency .Division Multiplex, [Ala87, Kam92, Kam93, Tui93]). 



Digital 
Input n 
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Figure 1.4 Sound channel. 
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Figure 1.5 DAB transmitter. 

The DAB receiver (Fig. 1.6) consists of the demodulator (DMOD), the demultiplexer 
(DMUX) and the source decoder (SD). The SD provides a linearly quantized PCM signal 
(Pulse Code Modulation). The PCM signal is fed over a Digital-to-Analog Converter (DA 
Converter) to an amplifier connected to loudspeakers. 




Figure 1.6 DAB receiver. 

For a more detailed description of the DAB transmission technique, an illustration 
based on filter banks is presented (see Fig. 1.7). The audio signal at a data rate of 768 kbit/s 
is decomposed into subbands with the help of an analysis filter bank (AFB). Quantization 
and coding based on psychoacoustic models are carried out within each subband. The data 
reduction leads to a data rate of 96-192 kbit/s. The quantized subband signals are provided 
with additional information (header) and combined together in a frame. This so-called ISO- 
MPEG1 frame [IS092] is first subjected to channel coding (CC). Time-interleaving (TIL) 
follows and will be described later on. The individual transmitting programs are combined 
in frequency multiplex (frequency-interleaving FIL) with a synthesis filter bank (SFB) into 
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one broad-band transmitting signal. The synthesis filter bank has several complex-valued 
input signals and one complex-valued output signal. The real-valued band-pass signal is 
obtained by modulating with e ja>c ‘ and taking the real part. At the receiver, the complex- 
valued base-band signal is obtained by demodulation followed by low-pass filtering. 
The complex-valued analysis filter bank provides the complex-valued band-pass signals 
from which the ISO-MPEG 1 frame is formed after frequency and time deinterleaving 
and channel decoding. The PCM signal is combined using the synthesis filter bank after 
extracting the subband signals from the frame. 





Figure 1.7 Filter banks within DAB. 



DAB Transmission Technique. The special problems of mobile communications are dealt 
with using a combination of the OFDM transmission technique with DPSK modulation 
and time and frequency interleaving. Possible disturbances are minimized by consecutive 
channel coding. The schematic diagram in Fig. 1.8 shows the relevant subsystems. 

For example, the transmission of a program Pi which is delivered as an IS O-MPEG 1 
stream is shown in Fig. 1.8. The channel coding doubles the data rate. The typical 
characteristics of a mobile communication channel like time and frequency selectivity are 
handled by using time and frequency interleaving with the help of a multicarrier technique. 
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Figure 1.8 DAB transmission technique. 



The burst disturbances of consecutive bits are reduced to single bit errors by spreading 
the bits over a longer period of time. The narrow-band disturbances affect only individual 
carriers by spreading the transmitter program Pi in the frequency domain, i.e, distribution 
of transmitter programs of carrier frequencies at a certain displacement. The remaining 
disturbances of the mobile channel are suppressed with the help of channel coding, i.e. by 
adding redundancy, and decoding with a Viterbi decoder. The implementation of an OFDM 
transmission is discussed in the following. 
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OFDM Transmission. The OFDM transmission technique is shown in Fig. 1.9. The 
technique stands out owing to its simple implementation in the digital domain. The data 
sequence c t (k ) which is to be transmitted, is written blockwise into a register of length 2 M. 
The complex numbers from d\ (m) to are formed from two consecutive bits (dibits). 

Here the first bit corresponds to the real part and the second to the imaginary part. The 
signal space shows the four states for the so-called QPSK [Kam92a, Pro89]. The vector 
d(m) is transformed with an inverse FFT (Fast Fourier Transform) into a vector e(m) 
which describes the values of the transmitted symbol in the time domain. The transmitted 
symbol x r (n) with period 7" sym is formed by the transmission of the M complex numbers 
e, ( m ) at sampling period 7$. The real-valued band-pass signal is formed at high frequency 
after DA conversion of the quadrature signals, modulation by e J0>ct and by taking the real 
part. At the receiver, the transmitted symbol becomes a complex-valued sequence x r (n) 
by demodulation with e~ ja)c ' and AD conversion of the quadrature signal. M samples of 
the received sequence x r (n) are distributed over the M input values f) (m) and transformed 
into the frequency domain with the help of FFT. The resulting complex numbers gi ( m ) are 
again converted to dibits and provide the received sequence c r (k ). Without the influence of 
the communication channel, the transmitted sequence can be reconstructed exactly. 



OFDM Transmission with a Guard Interval. In order to describe the OFDM trans- 
mission with a guard interval, the schematic diagram in Fig. 1.10 is considered. The 
transmission of a symbol of length M over a channel with impulse response h (n) of length 
L leads to a received signal y(n) of length M + L — 1. This means that the received 
symbol is longer than the transmitted signal. The exact reconstruction of the transmitted 
symbol is disturbed because of the overlapping of received symbols. Reconstruction of 
the transmitted symbol is possible by cyclic continuation of the transmitted symbol. Here, 
the complex numbers from the vector e(m) are repeated so as to give a symbol period of 
F sym = ( M + L)Ts . Each of the transmitted symbols is, therefore, extended to a length 
of M + L. After transmission over a channel with impulse response of length L, the 
response of the channel is periodic with length M. After the initial transient state of the 
channel, i.e. after the L samples of the guard interval, the following M samples are written 
into a register. Since a time delay occurs between the start of the transmitted symbol 
and the sampling shifted by L displacements, it is necessary to shift the sequence of 
length M cyclically by L displacements. The complex values g, (m) do not correspond 
to the exact transmitted values dj(m ) because of the transmission channel h(n). However, 
there is no influence of neighboring carrier frequencies. Every received value gi(m ) is 
weighted with the corresponding magnitude and phase of the channel at the specific 
carrier frequency. The influence of the communication channel can be eliminated by 
differential coding of consecutive dibits. The decoding process can be done according to 
Zi ( m ) = gi ( m)g * ( m — 1). The dibit corresponds to the sign of the real and imaginary parts. 
The DAB transmission technique presented stands out owing to its simple implementation 
with the help of FFT algorithms. The extension of the transmitted symbol by a length L 
of the channel impulse response and the synchronization to collect the M samples out of 
the received symbol have still to be carried out. The length of the guard interval must be 
matched to the maximum echo delay of the multipath channel. Owing to differential coding 
of the transmitted sequence, an equalizer at the receiver is not necessary. 
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Figure 1.9 OFDM transmission. 



Digital Radio Mondiale (DRM) 

In the interest of national and international broadcasting stations a more widespread 
program delivery across regional or worldwide regions is of specific importance. This 
is accomplished by analog radio transmission in the frequency range below 30 MHz. 
The limited audio quality of the amplitude modulation technique (channel bandwidth 9- 
10 kHz) with an audio bandwidth of 4.5 kHz leads to a low acceptance rate for such kind 
of audio broadcasting. The introduction of the digital transmission technique Digital Radio 
Mondiale 3 will replace the existing analog transmission systems. The digital transmission 
is based on OFDM and the audio coding MPEG4-AAC in combination with SBR (Spectral 
Band Replication). 

1 h t tp://www. cl rm . org . 
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x(n) h(n) y(n)=x(n)*h(n) 




Figure 1.10 OFDM transmission with a guard interval. 



Internet Audio 

The growth of the internet offers new distribution possibilities for information, but 
especially for audio and video signals. The distribution of audio signals is mainly driven by 
the MP3 format (MPEG-1 Audio Layer III [Bra94]) or in proprietary formats of different 
companies. The compressed transmission is used because the data rate of home users is 
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still low compared to lossless audio and video formats. Since the transmission is based on 
file transfer of packets, the data rates strongly depend on the providing server, the actual 
internet traffic and the access point of the home computer. A real-time transfer of high- 
quality music is still not possible. If the audio compression is high enough to achieve a just 
acceptable audio quality, a real-time transfer with a streaming technology is possible, since 
the file size is small and a transmission needs less time (see Fig. 1.11). For this a receiver 
needs a double memory filled with incoming packets of a coded audio file and a parallel 
running audio decoding. After decoding of a memory with a sufficiently long audio portion 
the memory is transferred to the sound card of the computer. During sound playback of 
the decoded audio signal further incoming packets are received and decoded. Packet loss 
can lead to interrupts in the audio signal. Several techniques for error concealment and 
protocols allow the transfer of coded audio. 



Audio 




Figure 1.11 Audio streaming via the internet. 



1.3 Storage Media 

Compact Disc 

The technological advances in the semiconductor industry have led to economical storage 
media for digitally encoded information. Independently of developments in the computer 
industry, the compact disc system was introduced by Philips and Sony in 1982. The storage 
of digital audio data is carried out on an optical storage medium. The compact disc operates 
at a sampling rate of fs = 44.1 kHz. 4 The essential specifications are summarized in 
Table 1.1. 

R-DAT (Rotary-head Digital Audio on Tape) 

The R-DAT system makes use of the heliscan method for two-channel recording. The 
available devices enable the recording of 16-bit PCM signals with all three sampling rates 
(Table 1.2) on a tape. R-DAT recorders are used in studio recording as well as in consumer 
applications. 

MiniDisc and MP3 Format 

Advanced coding techniques are based on psychoacoustics for data reduction. A widespread 
storage system is the MiniDisc by Sony. The Mini Disc system operates with the ATRAC 
technique (Adaptive Transform Acoustic Coding, [Tsu92]) and has a data rate of about 

4 3 x 490 x 30 Hz (NTSC) = 3 x 588 x 25 Hz (CCIR) = 44.1 kHz. 
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Table 1.1 Specifications of the CD system [Ben88]. 



Type of recording 
Signal recognition 
Storage density 

Audio specification 
Number of channels 
Duration 
Frequency range 
Dynamic range 
THD 

Signal format 
Sampling rate 
Quantization 
Pre-emphasis 
Error Correction 
Data rate 
Modulation 
Channel bit rate 
Redundancy 

Mechanical specification 
Diameter 
Thickness 

Diameter of the inner hole 
Program range 
Reading speed 



Optical 
682 Mbit/in 2 

2 

Approx. 60 min. 
20-20 000 Hz 
>90 dB 
< 0 . 01 % 



44.1 kHz 

16-bit PCM (2’s complement) 

None or 50/15 ps 

CIRC 

2.034 Mbit/s 
EFM 

4.3218 Mbit/s 
30% 

120 mm 
1.2 mm 
15 mm 
50-1 16 mm 
1.2-1. 4 m/s 
500-200 r/min. 



2 • 140 kbit/s for a stereo channel. A magneto-optical storage medium is used for recording. 
The MP3 format was developed simultaneously, but the availability of recording and 
playback systems has taken a longer time. Simple MP3 recorders and playback systems 
are now available for the consumer market. 



Super Audio Compact Disc (SACD) 

The SACD was specified by Philips and Sony in 1999 as a further development of the 
compact disc with the objective of improved sound quality. The audio frequency range of 
20 kHz is perceived as a limiting audio quality factor by some human beings, and the anti- 
aliasing and reconstruction filters may lead to ringing resulting from linear phase filters. 
This effect follows from short audio pulses leading to audible transients of the filters. 
In order to overcome these problems the audio bandwidth is extended to 100 kHz and 
the sampling frequency is increased to 2.8224 MHz (64 x 44.1 kHz). With this the filter 
specifications can be met with simple first-order filters. The quantization of the samples is 
based on a 1-bit quantizer within a delta-sigma converter structure which uses noise shaping 
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Table 1.2 Specifications of the R-DAT system [Ben88], 



Type of recording 
Signal recognition 
Storage capacity 

Audio specification 
Number of channels 
Duration 
Frequency range 
Dynamic range 
THD 

Signal format 
Sampling rate 
Quantization 
Error correction 
Channel coding 
Data rate 
Channel bit rate 

Mechanical specification 
Tapewidth of magnet 
Thickness 

Diameter of head drum 
Revolutions per min. 
Rel. track speed 



Magnetic 
2 GB 



2 

Max. 120 min. 

20-20 000 Hz 
>90 dB 
< 0 . 01 % 

48, 44.1, 32 kHz 

16-bit PCM (2’s complement) 

CIRC 

8/10 modulation 
2.46 Mbit/s 
9.4 Mbit/s 

3.8 mm 
13 pm 
3 cm 

2000 r/min. 

3.133 m/s 
500-200 r/min. 



(see Fig. 1.12). The 1-bit signal with 2.8224 MHz sampling frequency is denoted a DSD 
signal (Direct Stream Digital). The DA conversion of a DSD signal into an analog signal 
is accomplished with a simple analog first-order low-pass. The storage of DSD signals is 
achieved by a special compact disc (Fig. 1.13) with a CD layer in PCM format and an HD 
layer (High Density) with a DVD 4.38 GByte layer. The HD layer stores a stereo signal 
in 1-bit DSD format and a 6-channel 1-bit signal with a lossless compression technique 
(Direct Stream Transfer DST) [Jan03]. The CD layer of the SACD can be replayed with 
a conventional CD player, whereas special SACD players can replay the HD layer. An 
extensive discussion of 1-bit delta-sigma techniques can be found in [LipOla, LipOlb, 
VanOl, Lip02, Van04], 



Analog- 


1-bit 


1-bit 

DSD 

Memory 


1-bit 


DSD- 


DSD 


64-fs 


64-fs 


Analog 



Figure 1.12 SACD system. 
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Protection layer 
CD Layer 

Plastic 
HD Layer 

Plastic 

Laser Scanner 



Figure 1.13 Layer of the SACD. 



Digital Versatile Disc - Audio (DVD-A) 

To increase the storage capacity of the CD the Digital Versatile Disc (DVD) was developed. 
The physical dimensions are identical to the CD. The DVD has two layers with one or 
two sides, and the storage capacity per side has been increased. For a one-sided version 
for audio applications the storage capacity is 4.7 GB. A comparison of specifications for 
different disc media is shown in Table 1.3. Besides stereo signals with different sampling 
rates and word-lengths a variety of multi-channel formats can be stored. For data reduction 
a lossless compression technique, MLP (Meridian Lossless Packing), is applied. The 
improved audio quality compared to the CD audio is based on the higher sampling rates 
and word-lengths and the multichannel features of the DVD-A. 



Table 1.3 Specifications of CD, SACD and DVD-A. 



Parameter 


CD 


SACD 


DVD-A 


Coding 


16-bit PCM 


1-bit DSD 


16-/20-/24-bit PCM 


Sampling rate 


44.1 kHz 


2.8224 MHz 


44.1/48/88.2/96/176.4/192 kHz 


Channels 


2 


2-6 


1-6 


Compression 


No 


Yes (DST) 


Yes (MLP) 


Recording time 


74 min. 


70-80 min. 


62-843 min. 


Frequency range 


20-20 000 Hz 


20-100 000 Hz 


20-96 000 Hz 


Dynamic range 


96 dB 


120 dB 


144 dB 


Copy protection 


No 


Yes 


Yes 



1.4 Audio Components at Home 

Domestic digital storage media are already in use, like compact discs, personal computers 
and MP3 players, which have digital outputs, and can be connected to digital post- 
processing systems right up to the loudspeakers. The individual tone control consists of 
the following processing. 
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Equalizer 

Spectral modification of the music signal in amplitude and phase and the automatic 
correction of the frequency response from loudspeaker to listening environment are desired. 



Room Simulation 

The simulation of room impulse responses and the processing of music signals with special 
room impulse response are used to give an impression of a room like a concert hall, a 
cathedral or a jazz club. 



Surround Systems 

Besides the reproduction of stereo signals from a CD over two frontal loudspeakers, more 
than two channels will be recorded in the prospective digital recording systems [Lin93]. 
This is already illustrated in the sound production for cinema movies where, besides the 
stereo signal (L, R), a middle channel ( M ) and two additional room signals (Lb, Rb ) 
are recorded. These surround systems are also used in the prospective digital television 
systems. The ambisonics technique [Ger85] is a recording technique that allows three- 
dimensional recording and reproduction of sound. 



Digital Amplifier Concepts 

The basis of a digital amplifier is pulse width modulation as shown in Fig. 1.14. With the 
help of a fast counter, a pulse width modulated signal is formed out of the iu-bit linearly 
quantized signal. Single-sided and double-sided modulated conversion are used and they 
are represented by two and three states, respectively. Single-sided modulation (2 states, — 1 
and + 1 ) is performed by a counter which counts upward from zero with multiples of the 
sampling rate. The number range of the PCM signal from —1 to +1 is directly mapped 
onto the counter. The duration of the pulse width is controlled by a comparator. For pulse 
width modulation with three states (—1, 0, +1), the sign of the PCM signal determines the 
state. The pulse width is determined by a mapping of the number range from 0 to 1 onto 
a counter. For double-sided modulation, an upward/downward counter is needed which 
has to be clocked at twice the rate compared with single-sided modulation. The allocation 
of pulse widths is shown in Fig. 1.14. In order to reduce the clock rate for the counter, 
pulse width modulation is carried out after oversampling (Oversampling) and noise shaping 
(Noise Shaping) of the quantization error (see Fig. 1.15, [Gol90]). Thus the clock rate of 
the counter is reduced to 180.6 MHz. The input signal is first upsampled by a factor of 
16 and then quantized to 8-bits with third-order noise shaping. The use of pulse shaping 
with delta-sigma modulation is shown in Fig. 1.16 [And92]. Here a direct conversion of the 
delta-sigma modulated 1-bit signal is performed. The pulse converter shapes the envelope 
of the serial data bits. The low-pass filter reconstructs the analog signal. In order to reduce 
nonlinear distortion, the output signal is fed back (see Fig. 1.17, [Klu92]). New methods 
for the generation of pulse width modulation try to reduce the clock rates and the high 
frequency components [Str99, StrOl]. 




Counter 
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double-sided PWM (2 states, +1,-1) double-sided PWM (3 states, +1 ,0,-1 ) 



Figure 1.14 Pulse width modulation. 



180.6 MHz 



1 




Figure 1.15 Pulse width modulation with oversampling and noise shaping. 
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1-bit DAC 





Figure 1.17 Delta-sigma modulated amplifier with feedback. 



Digital Crossover 

In order to perform digital crossovers for loudspeakers, a linear phase decomposition of the 
signal with a special filter bank [Z6192] is done (Fig. 1.18). In a first step, the input signal 
is decomposed into its high-pass and low-pass components and the high-pass signal is fed 
to a DAC over a delay unit. In the next step, the low-pass signal is further decomposed. 
The individual band-pass signals and the low-pass signal are then fed to the respective 
loudspeaker. Further developments for the control of loudspeakers can be found in [Kli94, 
Kli98a, Kli98b, MU199]. 
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y^n,) y 2 (n 2 ) y 3 (n 3 ) 



y N (n N ) 



High-pass Band-pass 1 Band-pass 2 Low-pass 

Figure 1.18 Digital crossover (FS; frequency splitting, TC, transition bandwidth control, DEL, 
delay). 
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Chapter 2 

Quantization 



Basic operations for AD conversion of a continuous-time signal x(t) are the sampling 
and quantization of x(n) yielding the quantized sequence xq(h) (see Fig. 2.1). Before 
discussing AD/DA conversion techniques and the choice of the sampling frequency fs = 
1 /Ts in Chapter 3 we will introduce the quantization of the samples x(n) with finite 
number of bits. The digitization of a sampled signal with continuous amplitude is called 
quantization. The effects of quantization starting with the classical quantization model are 
discussed in Section 2. 1 . In Section 2.2 dither techniques are presented which, for low-level 
signals, linearize the process of quantization. In Section 2.3 spectral shaping of quantization 
errors is described. Section 2.4 deals with number representation for digital audio signals 
and their effects on algorithms. 



Sampling 




Figure 2.1 AD conversion and quantization. 



2.1 Signal Quantization 

2.1.1 Classical Quantization Model 

Quantization is described by Widrow’s quantization theorem [Wid61]. This says that a 
quantizer can be modeled (see Fig. 2.2) as the addition of a uniform distributed random 
signal e(n) to the original signal x(n) (see Fig. 2.2, [Wid61]). This additive model, 

xq(u) = x(n) + e(n), (2.1) 

is based on the difference between quantized output and input according to the error signal 

e(n ) = xq(ii) — x(n). (2.2) 
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-ox Q (n) 



x ( n ) o *-(+) o x Q (n)=x(n)+e(n) 

e(n) 




Figure 2.2 Quantization. 



This linear model of the output xq(ji) is only then valid when the input amplitude has a 
wide dynamic range and the quantization error e(n) is not correlated with the signal x(n). 
Owing to the statistical independence of consecutive quantization errors the autocorrelation 
of the error signal is given by rppim) = erg • S(m), yielding a power density spectrum 
SEE(e' a ) — er|. 

The nonlinear process of quantization is described by a nonlinear characteristic curve as 
shown in Fig. 2.3a, where Q denotes the quantization step. The difference between output 
and input of the quantizer provides the quantization error e(n ) = xq(h) — x(n), which is 
shown in Fig. 2.3b. The uniform probability density function (PDF) of the quantization 
error is given (see Fig. 2.3b) by 



1 

PE(e ) = — rect 




(2.3) 




Figure 2.3 (a) Nonlinear characteristic curve of a quantizer, (b) Quantization error e and its 
probability density function (PDF) PE ( e )• 



The nith moment of a random variable E with a PDF pe(?) is defined as the expected 
value of E m : 

/ OO 

e m p E (e ) de. 

-OO 



(2.4) 
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For a uniform distributed random process, as in (2.3), the first two moments are given by 



m p = E{E] — 0 mean value 


(2.5) 


2 2 Q 2 

of — E{E~} — -j-^- variance. 


(2.6) 


The signal-to-noise ratio 




SNR = 101og 10 ^j dB 


(2.7) 


is defined as the ratio of signal power of to error power of . 

For a quantizer with input range ±x max and word-length u>, 
can be expressed as 

Q = 2x max /2 W . 


the quantization step size 
(2.8) 


By defining a peak factor, 

D Xmax 2 U ’~'Q 

— ~ 


(2.9) 



ox ox 

the variances of the input and the quantization error can be written as 

r 2 

_2 ■'max 



1 F 

a 2 — @1- 1 X 

a£ ~l2~ 



2 = - x 2 “ 

12 2 2w ^ ma x 



The signal-to-noise ratio is then given by 



*Lul/ P F 

_ r 2 9— 2 w 

3 A max^ 



SNR = 101og 10 ^ 

= 6.02 w- 10 log 10 (P^/3) dB. 

A sinusoidal signal (PDF as in Fig. 2.4) with Pp — \[2 gives 



\2w 



10 log 10 ( 2 

* 17 



( 2 . 10 ) 

( 2 . 11 ) 



( 2 . 12 ) 



SNR = 6.02u;+ 1.76 dB. (2.13) 

For a signal with uniform PDF (see Fig. 2.4) and Pp = \/3 we can write 

SNR = 6.02u; dB, (2.14) 

and for a Gaussian distributed signal (probability of overload < 10 5 leads to Pp — 4.61, 
see Fig. 2.5), it follows that 

SNR=6.02w-8.5dB. (2.15) 

It is obvious that the signal-to-noise ratio depends on the PDF of the input. For digital 
audio signals that exhibit nearly Gaussian distribution, the maximum signal-to-noise ratio 
for given word-length w is 8.5 dB lower than the rule-of-thumb formula (2.14) for the 
signal-to-noise ratio. 
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Figure 2.4 Probability density function (sinusoidal signal and signal with uniform PDF). 



s X~ x max^.61 




Figure 2.5 Probability density function (signal with Gaussian PDF). 



2.1.2 Quantization Theorem 



The statement of the quantization theorem for amplitude sampling (digitizing the ampli- 
tude) of signals was given by Widrow [Wid61]. The analogy for digitizing the time axis is 
the sampling theorem due to Shannon [Sha48]. Figure 2.6 shows the amplitude quantization 
and the time quantization. First of all, the PDF of the output signal of a quantizer is 
determined in terms of the PDF of the input signal. Both probability density functions 
are shown in Fig. 2.7. The respective characteristic functions (Fourier transform of a PDF) 
of the input and output signals form the basis for Widrow’s quantization theorem. 
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Figure 2.6 Amplitude and time quantization. 
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Figure 2.7 Probability density function of signal x(n) and quantized signal xq(h). 



First-order Statistics of the Quantizer Output 

Quantization of a continuous-amplitude signal x with PDF px(x) leads to a discrete- 
amplitude signal v with PDF pyiy) (see Fig. 2.8). The continuous PDF of the input is 
sampled by integrating over all quantization intervals (zone sampling). This leads to a 
discrete PDF of the output. 

In the quantization intervals, the discrete PDF of the output is determined by the 
probability 



W[kQ\ = 1¥ 



Q Q 

— + kQ<x< — 
2 ^ “ 2 



- kQ 



S. 



Q/2+kQ 



px(x) dx. 



(2.16) 



-Q/2+kQ 
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Figure 2.8 Zone sampling of the PDF. 
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The PDF of the output can hence be determined by convolution of a rect function [Lip92] 
with the PDF of the input. This is followed by an amplitude sampling with resolution Q as 
described in (2.23) (see Fig. 2.9). 




Py(y) 



Figure 2.9 Determining PDF of the output. 



Using FT{f\(t) ■ / 2 (f)} = 4^F\{ju>) * Fjijeo), the characteristic function (Fourier 
transform of py(y)) can be written, with u a — 2 tz /Q, as 



1 ^ 

Py (ju) — u 0 } S(u — ku 0 ) 

2tt . 

k=—oo 
00 

= S (u — ku 0 ) * 



Q 



sin(wlf) 



• Pxiju) 



k=—OQ 



sin (m 



• Pxiju) 



(2.24) 

(2.25) 




Pxiju — jku„) - 



ii[(m — ku a )lf] 
{u — ku a )U- 



(2.26) 



Equation (2.26) describes the sampling of the continuous PDF of the input. If the 
quantization frequency u 0 — 2it/Q is twice the highest frequency of the characteristic 
function Px (ju) then periodically recurring spectra do not overlap. Hence, a reconstruction 
of the PDF of the input px(x) from the quantized PDF of the output py(y) is possible (see 
Fig. 2.10). This is known as Widrow’s quantization theorem. Contrary to the first sampling 
theorem (Shannon’s sampling theorem, ideal amplitude sampling in the time domain) 
F A (jco) — y 12kL-oo F(j M ~ jku> 0 ), it can be observed that there is an additional 



multiplication of the periodically characteristic function with 



sin 



[(M-teo)-f ] 
(u-ku 0 )Q 



(see (2.26)). 



If the base-band of the characteristic function ( k — 0) 



Py(ju) = Pxiju) 



sin(zr If ) 




Pe (ju ) 



(2.27) 



is considered, it is observed that it is a product of two characteristic functions. The 
multiplication of characteristic functions leads to the convolution of PDFs from which 
the addition of two statistically independent signals can be concluded. The characteristic 
function of the quantization error is thus 



PeQu) — 




(2.28) 
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Figure 2.10 Spectral representation. 



and the PDF 




Figure 2.11 PDF and characteristic function of quantization error. 



The modeling of the quantization process as an addition of a statistically independent 
noise signal to the input signal leads to a continuous PDF of the output (see Fig. 2.12, 
convolution of PDFs and sampling in the interval Q gives the discrete PDF of the output). 
The PDF of the discrete-valued output comprises Dirac pulses at distance Q with values 
equal to the continuous PDF (see (2.23)). Only if the quantization theorem is valid can the 
continuous PDF be reconstructed from the discrete PDF. 




Figure 2.12 PDF of model. 



In many cases, it is not necessary to reconstruct the PDF of the input. It is sufficient to 
calculate the moments of the input from the output. The wth moment can be expressed in 
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terms of the PDF or the characteristic function: 

/ oo 

y m PY(y)dy (2.30) 

-oo 

= ( _. r <r£M (2.3!) 

du u = 0 

If the quantization theorem is satisfied then the periodic terms in (2.26) do not overlap and 
the 777 th derivative of Pyiju) is solely determined by the base-band 1 so that, with (2.26), 

d m sin ( uSr) 

E{Y m ] = (— j) m ——— P x (ju) V 2> . (2.32) 

dl<n uf „= 0 

With (2.32), the first two moments can be determined as 

7 n Y = E{Y] = E{X}, (2.33) 

O 2 

= E{Y 2 } = E{X 2 } + . (2.34) 



Second-order Statistics of Quantizer Output 

In order to describe the properties of the output in the frequency domain, two output values 
Yi (at time n i) and Yi (at time 112 ) are considered [Lip92] . For the joint density function, 



PY\ k 2 (>’i , yi) = $QQ(y 1 



’ Jy\ y 2 

, vt) rect — , — 

L \Q Q 



* px l x 2 iy\, yi) 



$QQ(yi, yi) — Sq(v 1 ) • $Q(y 2 ) 



.VI V2 



, — rect — • rect 

V Q QJ \Q / 

For the two-dimensional Fourier transform, it follows that 



PY l Y 2 (ju ] .ju 2 ) = E E < 5 (/ 7 i — ku 0 )8(u 2 — lu 0 ) 



k=—o o /=— 00 



_ sin ( r / 1 -8- ) sin(7(2^r) 

* o ' V ’ ■Px l x 2 (ju l , ju 1 ) 

U 1 Y U 2 -J 

00 00 

= E E p XiX 2 (jui -jku 0 ,ju 2 - jlu 0 ) 

k=—oo l =— OO 

sin[(77i — £m 0 )-j] sin[(M2 — Iu 0 )y] 

(mi — ku 0 )% ( 7/2 — lu a )Q 



1 This is also valid owing to the weaker condition of Sripad and Snyder [Sri77] discussed in the next section. 
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Similar to the one-dimensional quantization theorem, a two-dimensional theorem 
[Wid61] can be formulated: the joint density function of the input can be reconstructed 
from the joint density function of the output, if / > x 1 x 2 (/ M i»7 M 2 ) = 0 for u\ > u 0 /2 and 
ui > Uol 2. Here again, the moments of the joint density function can be calculated as 
follows: 



*1 ™ = <-n- + -^^CM,*> sin( “ l¥) sin( " 2#) 



du1'du n 2 



Mlf 



«2y 



lwi=0,M2— 0 

From this, the autocorrelation function with m — 112 — «i can be written as 

Q 2 



ryY(m) — E{Y\Y2}(ni) — 



E{X Z 



m — 0 , 



12 ’ 

E{X 1 X 2 K 777 ), elsewhere 



(2.40) 



(2.41) 



(for m = 0 we obtain (2.34)). 



2.1.3 Statistics of Quantization Error 

First-order Statistics of Quantization Error 



The PDF of the quantization error depends on the PDF of the input and is dealt with in 
the following. The quantization error e = xq — x is restricted to the interval [— y, -r]- It 
depends linearly on the input (see Fig. 2.13). If the input value lies in the interval [— y , -r] 
then the error is e = 0 — x. For the PDF we obtain pe(s ) = px(e)- If the input value lies 
in the interval [— ^ + Q, Q + Q~\ then the quantization error is e = QlQ~ [ x + 0.5J — x 
and is again restricted to [— -If], The PDF of the quantization error is consequently 

PE(e ) = pxic + Q) and is added to the first term. For the sum over all intervals we can 
write 



PE(e) = 



OO 

Y2 Pxie-kQ), 
k=—o o 



Q , Q 

< e < — , 

2 ~ 2 



(2.42) 



0 , 



elsewhere. 




Figure 2.13 Probability density function and quantization error. 
Because of the restricted values of the variable of the PDF, we can write 



Pe(c) — rect| —J ^ p x (e - kQ) 



(2.43) 



k=—o o 



Q 



= rect 



[px(e) * 



(2.44) 
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The PDF of the quantization error is determined by the PDF of the input and can be 
computed by shifting and windowing a zone. All individual zones are summed up for 
calculating the PDF of the quantization error [Lip92]. A simple graphical interpretation 
of this overlapping is shown in Fig. 2.14. The overlapping leads to a uniform distribution 
of the quantization error if the input PDF px(x) is spread over a sufficient number of 
quantization intervals. 



PxM 



-Q/2 





Figure 2.14 Probability density function of the quantization error. 



For the Fourier transform of the PDF from (2.44) it follows that 
Pf.(Ju) = - —Q 



1 sin(/(-2) 



,Q 

2 



2: x ^ 

p x(ju)— 2_^ 8{u-ku 0 ) 
^ k=—oo 



sin (m-t) 



J2 Px(jku 0 )S(u - ku „ ) 



L k=—oo 



= Px(jku a ) 



k=—oo 



sin(t/-^) 

u % 



* 8 (u — ku 0 ) 



(2.45) 

(2.46) 

(2.47) 



oo 

PE(ju)= ^2 P X ( J ku o'> 

k=—o o 



sin[(w — fat 0 )y] 
(u — ku „ )y 



(2.48) 



If the quantization theorem is satisfied, i.e. if Pxiju ) = 0 for u > u 0 /2. then there is only 
one non- zero term (k — 0 in (2.48)). The characteristic function of the quantization error is 
reduced, with Px( 0) = 1, to 



PeQu) = 




(2.49) 
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Hence, the PDF of the quantization error is 



1 

PE(e) = — rect 




(2.50) 



Sripad and Snyder [Sri77] have modified the sufficient condition of Widrow (band-limited 
characteristic function of input) for a quantization error of uniform PDF by the weaker 
condition 



Px(jku 0 ) = Px 




— 0 



for all k ± 0. 



(2.51) 



The uniform distribution of the input PDF, 



1 ( x \ 

px(x) = — rect — , 

Q \Qj 



with characteristic function 



Px(ju) = 



sin(w^) 

«? 



(2.52) 



(2.53) 



does not satisfy Widrow’s condition for a band-limited characteristic function, but instead 
the weaker condition, 



Px 




sin(7rk) 

Ttk 



for all k ± 0, 



(2.54) 



is fulfilled. From this follows the uniform PDF (2.49) of the quantization error. The weaker 
condition from Sripad and Snyder extends the class of input signals for which a uniform 
PDF of the quantization error can be assumed. 

In order to show the deviation from the uniform PDF of the quantization error as a 
function of the PDF of the input, (2.48) can be written as 



sin[zr -^] 



Pe { ju ) = P X (0) L q 2J + £ Px \ j 



k=—oo,k^0 



.2 nk\ sin[(w - ku (l )^] 



Q 7 (it — ku 0 ) % 



sin [m If] 






/ 2itk\ sin[w#] 

£ p x ( j —pr~ ) q — *<$(« - kuo). 

k=-oo,kjt0 V V ' m T 



The inverse Fourier transform yields 

1 + 



1 ( e 

prie) — — recti — 

Q \Q 



2nk 



E l Zjji k, \ ( 2nke \ 

^-q) “T— ) 

k=-oo,k^0 \ ^ / \ ^ 



■^4 ( 2jtk\ ( 2jtke\ 



elsewhere. 



(2.55) 



Q , Q 

— < e < — , 
2 ~ 2 



(2.56) 



(2.57) 



Equation (2.56) shows the effect of the input PDF on the deviation from a uniform PDF. 
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Second-order Statistics of Quantization Error 

To describe the spectral properties of the error signal, two values E\ (at time ni) and £2 
(at time n 2 ) are considered [Lip92], The joint PDF is given by 

( Cl C2 \ 

~Q ~Q * ^QQ^ e t’ e ' 2 )]- (2.58) 

Here 8 Q Q(e 1 , e 2 ) = 8 Q (e\) • 8 Q (e 2 ) and rect(ei /Q, e 2 / Q ) = rect(ei/g) • rect(e 2 / Q). For 
the Fourier transform of the joint PDF, a similar procedure to that shown by (2.45)— (2.48) 
leads to 

OO OO 

PEiE 2 (jU l Ju 2 ) = E E Px ] x 1 (jk\U 0 Jk 2 u 0 ) 

k\=—00 k2 =—OQ 

sin [(mi - AqM 0 )f] sin[(n 2 - k 2 u 0 )® ] 

O O ‘ 

(mi - k\Uo)^ ( m 2 - k2Uo)f 

If the quantization theorem and/or the Sripad-Snyder condition 

Px 1 x 2 (jkiUo,jk 2 Uo) = 0 for all k \ , k 2 ^ 0 (2.60) 



are satisfied then 



suiTmi If 1 sin^n-lfl 
P El e 2 (ju 1 , ju 2 ) = 1 P 2J L . 

W 1 y M2y 



Thus, for the joint PDF of the quantization error, 

,>W*"«)=2rec,(4) 4 r ec,(|). 

= p El (e 1 ) • ps 2 (e 2 ). 



< ci, e 2 < 



Due to the statistical independence of quantization errors (2.63), 

£{£f£[?} = £{£["}• £{££}. 

For the moments of the joint PDF, 

Q/n+n 

£{£?'£?} = (-jr^—-P ElE2 (u 1 , «2) 

dM l ou 2 ui=0,u 2 =0 

From this, it follows for the autocorrelation function with m = n 2 — n\ that 



rEE(ni) — E{£i £ 2 } = 



E{£*-}, m — 0, 

E{ £1 £ 2 }, elsewhere 



[ Q 2 0 

— , m = 0, 

= 12 



elsewhere 



Q 2 

= — <5(m). 

12 



(2.68) 
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The power density spectrum of the quantization error is then given by 

+OO q 2 

S EE {e ja )= J2 r EE(m)e~j nm = ^, (2.69) 

m =— oo 

which is equal to the variance ct| = Q 2 / 12 of the quantization error (see Fig. 2.15). 




Figure 2.15 Autocorrelation r EE (m) and power density spectrum S EE (e^) of quantization 
error e(n). 
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with the characteristic function 



Px(ju) = exp 



2 2 

-u a 



Using (2.57), the PDF of the quantization error is then given by 

2i r 2 k 2 o 2 ' 



PE (e) 



I 

eL 

0 , 



1 + 



(2jtke\ 

2 E cos (— J 



k= 1 



exp - 



Q 2 



(2.76) 



Q . Q 

< e < — , 

2 - 2 (2.77) 

elsewhere. 



Figure 2.16a shows the PDF (2.77) of the quantization error for different variances of the 
input. 



a) 




e/Q — > 



b) 




Figure 2.16 (a) PDF of quantization error for different standard deviations of a Gaussian PDF input, 
(b) Variance of quantization error for different standard deviations of a Gaussian PDF input. 



For the mean value and the variance of a quantization error, it follows using (2.77) that 
E{E } — 0 and 



/ °° O 2 

e 2 p E (e) de= — 

-OO 



12 ^ (_i)* / 2n 2 k 2 a 2 

— eXP ( Q 2 



k=i 



(2.78) 



Figure 2.16b shows the variance of the quantization error (2.78) for different variances of 
the input. 

For a Gaussian PDF input as given by (2.75) and (2.76), the correlation (see (2.74)) 
between input and quantization error is expressed as 

k ( 2it 2 k 2 o 2 \ 

£{*•£} = 2a 2 ^(-1)' exp -^). (2.79) 

k= 1 \ Q / 



The correlation is negligible for large values of a / Q. 
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2.2 Dither 

2.2.1 Basics 

The re quantization (renewed quantization of already quantized signals) to limited word- 
lengths occurs repeatedly during storage, format conversion and signal processing algo- 
rithms. Here, small signal levels lead to error signals which depend on the input. Owing 
to quantization, nonlinear distortion occurs for low-level signals. The conditions for the 
classical quantization model are not satisfied anymore. To reduce these effects for signals 
of small amplitude, a linearization of the nonlinear characteristic curve of the quantizer is 
performed. This is done by adding a random sequence d(n) to the quantized signal x (n) 
(see Fig. 2.17) before the actual quantization process. The specification of the word-length 
is shown in Fig. 2.18. This random signal is called dither. The statistical independence of 
the error signal from the input is not achieved, but the conditional moments of the error 
signal can be affected [Lip92, Ger89, Wan92, WanOO]. 



d(n) 

(s bit) 

y( n ) 
(w bit) 




Figure 2.17 Addition of a random sequence before a quantizer. 



w bit 



r bit 







m 


s 



Sign Zero bits Dither 

Figure 2.18 Specification of the word-length. 



The sequence d(n), with amplitude range (—<2/2 < d(n) < Q/2), is generated with 
the help of a random number generator and is added to the input. For a dither value with 

Q = 2 _(u ’ _1, : 

d k = k2~ r Q, —2 s-1 <k< 2 s _1 - 1. (2.80) 

The index k of the random number dk characterizes the value from the set of A' = 2 s 
possible numbers with the probability 



P(d k ) = 



2 ~ s , 

0 , 



-2 s ~ l <k< 2 s ~ l - 1, 
elsewhere. 



(2.81) 



With the mean value d — dkP(dk), the variance a 2 ^ = — d] 2 P(dk ) and the 

quadratic mean d 2 = d k P(dk), we can rewrite the variance as crj = d 2 — d" . 

For a static input amplitude V and the dither value dk the rounding operation [Lip86] is 
expressed as 



g(V + dk) — Q 



Q 



(2.82) 
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For the mean of the output g{V) as a function of the input V, we can write 

g(V) = Y J gW + dk)P(dk)- (2.83) 

k 

The quadratic mean of the output g 2 (V) for input V is given by 

?(V) = £> 2 (F + d k )P{d k ). (2.84) 

k 

For the variance dj^iV) for input V, 

dl(V) = + d k ) - g(V)} 2 P(d k ) = ?(V) - {f(F)} 2 . (2.85) 

k 

The above equations have the input V as a parameter. Figures 2.19 and 2.20 illustrate the 
mean output g(V) and the standard deviation duiV ) within a quantization step size, given 
by (2.83), (2.84) and (2.85). The examples of rounding and truncation demonstrate the 
linearization of the characteristic curve of the quantizer. The coarse step size is replaced 
by a finer one. The quadratic deviation from the mean output d 2 R (V) is termed noise 
modulation. For a uniform PDF dither, this noise modulation depends on the amplitude 
(see Figs. 2.19 and 2.20). It is maximum in the middle of the quantization step size 
and approaches zero toward the end. The linearization and the suppression of the noise 
modulation can be achieved by a triangular PDF dither with bipolar characteristic [Van89] 
and rounding operation (see Fig. 2.20). Triangular PDF dither is obtained by adding two 
statistically independent dither signals with uniform PDF (convolution of PDFs). A dither 
signal with higher-order PDF is not necessary for audio signals [Lip92, WanOO]. 

Unipolar RECT Dither Unipolar TRI Dither 




V/Q - V/Q - 



Figure 2.19 Truncation - linearizing and suppression of noise modulation (s = 4, m = 0). 

The total noise power for this quantization technique consists of the dither power and 
the power of the quantization error [Lip86]. The following noise powers are obtained by 
integration with respect to V . 






g(V) and d R (V) - 
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Bipolar RECT Dither Bipolar TRI Dither 




V/Q - V/Q - 



Figure 2.20 Rounding - linearizing and suppression of noise modulation (s = 4, m = 1). 
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The integrals in (2.91) are independent of dk . Moreover, P (dt) — 1. With the mean 
value of the quantization error 

1 f Q 

F=-/ Q(V)dV (2.92) 

U Jo 

and the quadratic mean error 

1 rQ 

e 2 = -/ Q 2 (V)dV, (2.93) 

U Jo 

it is possible to rewrite (2.91) as 

dl t = ^ 2 + 2de + d 2 . (2.94) 

With = e 2 — e 2 and crj ) =d 2 — d~, (2.94) can be written as 



d 2 0 1 — Op -\- {d -\- e)~ -\- o [j. 



(2.95) 



Equations (2.94) and (2.95) describe the total noise power as a function of the quantization 
(e, e 2 , or 2 - ) and the dither addition (d, d 2 . It can be seen that for zero-mean quantiza- 
tion, the middle term in (2.95) results in d + e — 0. The acoustically perceptible part of the 
total error power is represented by a 2 and crj. 



2.2.2 Implementation 

The random sequence d(n) is generated with the help of a random number generator 
with uniform PDF. For generating a triangular PDF random sequence, two independent 
uniform PDF random sequences d\ (n) and d 2 (n) can be added. In order to generate a 
triangular high-pass dither, the dither value d\{ri) is added to —d\(n — 1). Thus, only one 
random number generator is required. In conclusion, the following dither sequences can be 
implemented: 



^RECT(n) =di(n), (2.96) 

Jtri(«) = d\ (n) + d 2 (n), (2.97) 

<J H p(n) = d\(n) — d\(n — 1). (2.98) 

The power density spectra of triangular PDF dither and triangular PDF HP dither are shown 
in Fig. 2.21. Figure 2.22 shows histograms of a uniform PDF dither and a triangular PDF 
high-pass dither together with their respective power density spectra. The amplitude range 
of a uniform PDF dither lies between ± Q /2, whereas it lies between ± Q for triangular 
PDF dither. The total noise power for triangular PDF dither is doubled. 

2.2.3 Examples 

The effect of the input amplitude of the quantizer is shown in Fig. 2.23 for a 16-bit 
quantizer [Q — 2 -15 ). A quantized sinusoidal signal with amplitude 2 -15 (1-bit amplitude) 
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Power Density Spectrum 




Figure 2.21 Normalized power density spectrum for triangular PDF dither (TRI) with d\ (n) + d 2 (n) 
and triangular PDF high-pass dither (HP) with d\ (n) — d\ (n — 1). 



and frequency f/fs = 64/1024 is shown in Fig. 2.23a,b for rounding and truncation. 
Figure 2.23c,d shows their corresponding spectra. For truncation. Fig. 2.23c shows the 
spectral line of the signal and the spectral distribution of the quantization error with the 
harmonics of the input signal. For rounding (Fig. 2.23d with special signal frequency 
// fs — 64/1024), the quantization error is concentrated in uneven harmonics. 

In the following, only the rounding operation is used. By adding a uniform PDF random 
signal to the actual signal before quantization, the quantized signal shown in Fig. 2.24a 
results. The corresponding power density spectrum is illustrated in Fig. 2.24c. In the time 
domain, it is observed that the 1-bit amplitudes approach zero so that the regular pattern of 
the quantized signal is affected. The resulting power density spectrum in Fig. 2.24c shows 
that the harmonics do not occur anymore and the noise power is uniformly distributed over 
the frequencies. For triangular PDF dither, the quantized signal is shown in Fig. 2.24b. 
Owing to triangular PDF, amplitudes ±2 Q occur besides the signal values ±Q and zero. 
Figure 2.24d shows the increase of the total noise power. 

In order to illustrate the noise modulation for uniform PDF dither, the amplitude of 
the input is reduced to A = 2 -18 and the frequency is chosen as // fs — 14/1024. This 
means that input amplitude to the quantizer is 0.25 bit. For a quantizer without additive 
dither, the quantized output signal is zero. For RECT dither, the quantized signal is shown 
in Fig. 2.25a. The input signal with amplitude 0.25 Q is also shown. The power density 
spectrum of the quantized signal is shown in Fig. 2.25c. The spectral line of the signal and 
the uniform distribution of the quantization error can be seen. But in the time domain, 
a correlation between positive and negative amplitudes of the input and the quantized 
positive and negative values of the output can be observed. In hearing tests this noise 
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a) Histogram RECT 




x/Q 
c) PDS 




f/f s -> 



b) Histogram HP 




d) PDS 

Oi t t t t t 

-10 
-20 - 
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100 I 1 * 1 1 1 * 1 * 1 — 1 

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 



Figure 2.22 (a,b) Histogram and (c,d) power density spectrum of uniform PDF dither (RECT) with 
d\ (n) and triangular PDF high-pass dither (HP) with d\ (n) — d\ (, n — 1). 



modulation occurs if the amplitude of the input is decreased continuously and falls below 
the amplitude of the quantization step. This process occurs for all fade-out processes that 
occur in speech and music signals. For positive low-amplitude signals, two output states, 
zero and Q, occur, and for negative low-amplitude signals, the output states zero and — Q, 
occur. This is observed as a disturbing rattle which is overlapped to the actual signal. If the 
input level is further reduced the quantized output approaches zero. 

In order to reduce this noise modulation at low levels, a triangular PDF dither is 
used. Figure 2.25b shows the quantized signal and Fig. 2.25d shows the power density 
spectrum. It can be observed that the quantized signal has an irregular pattern. Hence a 
direct association of positive half-waves with the positive output values as well as vice 
versa is not possible. The power density spectrum shows that spectral line of the signal 
along with an increase in noise power owing to triangular PDF dither. In acoustic hearing 
tests, the use of triangular PDF dither results in a constant noise floor even if the input level 
is reduced to zero. 
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x i q- 5 a) Truncation 




x 1q- 5 b) Rounding 




c) f/fs 64/1024 




f/f S -> 



d) f/fs 64/1024 




f/f s -» 



Figure 2.23 One-bit amplitude - quantizer with truncation (a,c) and rounding (b,d). 



2.3 Spectrum Shaping of Quantization - Noise Shaping 

Using the linear model of a quantizer in Fig. 2.26 and the relations 

e(n) = y(n ) — x(n), (2.99) 

y(n) = [x(n)i Q (2.100) 

= x(n) + e(n), (2.101) 

the quantization error e{n) may be isolated and fed back through a transfer function // (z) 
as shown in Fig. 2.27. This leads to the spectral shaping of the quantization error as given 

by 



y{n) — [ a '( u ) — e(n) * h(n)]Q (2.102) 

= x(n) + e(n) — e(n) * h(n), (2.103) 

e\ (n) = y(n) — x(n) (2.104) 

= e(n)*(8(n)-h(n)), (2.105) 
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x jq- 5 a) RECT Dither 




x j0-5 b) TRI Dither 




c) f/fs 64/1024 




f/f s 



d) f/fs 64/1024 




f/fs -> 



Figure 2.24 One-bit amplitude - rounding with RECT dither (a,c) and TRI dither (b,d). 



and the corresponding Z-transforms 



Y(z) = X(z) + E(z)(l - H(z)) (2.106) 

E l (z) = E(z)d- H(z)). (2.107) 

A simple spectrum shaping of the quantization error e(n) is achieved by feeding back with 
H (z) = z~ l as shown in Fig. 2.28, and leads to 

y(n) = [x(n)-e(n- \)]q (2.108) 

= x(n) — e(n — 1) + e(n), (2.109) 

e\ (n) — y(n) — x(n) (2.110) 

= e(n) — e(n — 1), (2.111) 

and the Z-transforms 

Y(z) = X(z) + E(z)(l-z~ 1 ), (2.112) 

£i(z) = £(z)(l-z- 1 ). (2.113) 
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x jq- 5 a) RECT Dither xl0‘^ b) TRI Dither 





c) f/fs 14/1024 




f/fs -> 



d) f/fs 14/1024 




f/f s -» 



Figure 2.25 0.25-bit amplitude - rounding with RECT dither (a,c) and TRI dither (b,d). 




y(n) 



Figure 2.26 Linear model of quantizer. 



Equation (2.113) shows a high-pass weighting of the original error signal e(n). By choosing 
H(z) — z _1 (— 2 + z _1 ), second-order high-pass weighting given by 

E 2 (z) = £ (z) ( 1 - 2z“ 1 + z~ 2 ) (2. 1 14) 

can be achieved. The power density spectrum of the error signal for the two cases is given by 

S El E l (e i °) = \l-e- jQ l 2 S E E(e i °), 

S E2 E 2 (e jS2 ) = 11-2 e~J Q + e~J 2S2 l 2 S EE (e jS2 ). 



(2.115) 

(2.116) 
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Figure 2.29 shows the weighting of power density spectrum by this noise shaping 
technique. 




Figure 2.27 Spectrum shaping of quantization error. 




Figure 2.28 High-pass spectrum shaping of quantization error. 




By adding a dither signal d(n) (see Fig. 2.30), the output and the error are given by 



yin) = [x(n) + d(n) - e(n - l)]g 

= x(n) + d(n) — e(n — 1) + e(n) 



(2.117) 

(2.118) 
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and 



e\ (n) = y(n) — x(n) (2.119) 

= d(n) + e(n) - e(n - 1). (2.120) 

For the Z-transforms we write 

Viz) = X(z) + E(z)( 1 - z -1 ) + D(z), (2.121) 

Ei(z) = E(z)( 1 - z” 1 ) + D(z). (2.122) 



The modified error signal e\ (n) consists of the dither and the high-pass shaped quantization 
error. 



d(n) 




Figure 2.30 Dither and spectrum shaping. 

By moving the addition (Fig. 2.31) of the dither directly before the quantizer, a high- 
pass spectrum shaping is obtained for both the error signal and the dither. Here the 



following relationships hold: 

y(n) — [x(n) + d{n) - eo (n - l)]g (2.123) 

= x(n) + d(n) — eo (n — 1 ) + e(n), (2.124) 

eo (n) — yin) - (x(n) - e 0 (n - 1)) (2.125) 

= d(n) + e(n), (2.126) 

yin ) =xin) + din) — din — 1) + e(/t) — ein — 1), (2.127) 

e\ {n) = din) — d{n — 1) + ein) — ein — 1), (2.128) 

with the Z-transforms given by 

Y{z) = X(z) + £(z)(l - z” 1 ) + D(z)(l - z -1 ), (2.129) 

Eiiz) = E(z)i 1 - z -1 ) + £>(z)( 1 - z- 1 ). (2.130) 



Apart from the discussed feedback structures which are easy to implement on a digital 
signal processor and which lead to high-pass noise shaping, psychoacoustic-based noise 
shaping methods have been proposed in the literature [Ger89, Wan92, Hel07], These 
methods use special approximations of the hearing threshold (threshold in quiet, absolute 
threshold) for the feedback structure 1 — Hiz). Figure 2.32a shows several hearing 
threshold models as a function of frequency [IS0389, Ter79, Wan92]. It can be seen 






2.4 Number Representation 



47 



that the sensitivity of human hearing is high for frequencies between 2 and 6 kHz and 
sharply decreases for high and low frequencies. Figure 2.32b also shows the inverse ISO 
389-7 threshold curve which represents an approximation of the filtering operation in our 
perception. The feedback filter of the noise shaper should affect the quantization error 
with the inverse ISO 389 weighting curve. Hence, the noise power in the frequency range 
with high sensitivity should be reduced and shifted toward lower and higher frequencies. 
Figure 2.33a shows the unweighted power density spectra of the quantization error for three 
special filters H(z ) [Wan92, Hel07], Figure 2.33b depicts the same three power density 
spectra, weighted by the inverse ISO 389 threshold of Fig. 2.32b. These weighted power 
density spectra show that the perceived noise power is reduced by all three noise shapers 
versus the frequency axis. Figure 2.34 shows a sinusoid with amplitude Q — 2 -15 , which 
is quantized to w — 16 bits with psychoacoustic noise shaping. The quantized signal xq(ti ) 
consists of different amplitudes reflecting the low-level signal. The power density spectrum 
of the quantized signal reflects the psychoacoustic weighting of the noise shaper with a 
fixed filter. A time-variant psychoacoustic noise shaping is described in [DeK03, Hel07], 
where the instantaneous masking threshold is used for adaptation of a time-variant filter. 



d(n) 




2.4 Number Representation 

The different applications in digital signal processing and transmission of audio signals 
leads to the question of the type of number representation for digital audio signals. In 
this section, basic properties of fixed-point and floating-point number representation in the 
context of digital audio signal processing are presented. 

2.4.1 Fixed-point Number Representation 

In general, an arbitrary real number x can be approximated by a finite summation 



x Q = 



w — 1 



E b ‘ 2 ‘- 



(2.131) 



where the possible values for b, are 0 and 1 . 

The fixed-point number representation with a finite number w of binary places leads to 
four different interpretations of the number range (see Table 2. 1 and Fig. 2.35). 
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a) Hearing Thresholds in Quiet 





Figure 2.32 (a) Hearing thresholds in quiet, (b) Inverse ISO 389-7 threshold curve. 



Table 2.1 Bit location and range of values. 



Type 


Bit location 


Range of values 


Signed 2’s c. 


xq = —bo + Y f—/ b—i 2 ' 


-1 <x Q < 1 _2“ (w “ 1) 


Unsigned 2’s c. 


*Q=E"= l b_ i 2~ i 


0 < xq < 1 — 2~ w 


Signed int. 


x Q = -b w _ l 2'"- l +Y?Jb i 2 i 


-2 w ~ l <xq< 2 U ’“ 1 - 1 


Unsigned int. 


XQ = ES' b,2 l 


0 < xq < 2 W - 1 



The signed fractional representation (2’s complement) is the usual format for digital 
audio signals and for algorithms in fixed-point arithmetic. For address and modulo 
operation, the unsigned integer is used. Owing to finite word-length w, overflow occurs 
as shown in Fig. 2.36. These curves have to be taken into consideration while carrying out 
operations, especially additions in 2’s complement arithmetic. 

Quantization is carried out with techniques as shown in Table 2.2 for rounding and 
truncation. The quantization step size is characterized by Q — 2 _( '"~ l 1 and the symbol \_x\ 
denotes the biggest integer smaller than or equal to x. Figure 2.37 shows the rounding and 
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a) Unweighted PSDs 




b) Weighted PSDs 




Figure 2.33 Power density spectrum of three filter approximations (Wa3 third-order filter, Wa9 ninth- 
order filter, He8 eighth-order filter [Wan92, Hel07]): (a) unweighted PSDs, (b) inverse ISO 389-7 
weighted PSDs. 



truncation curves for 2’s complement number representation. The absolute error shown in 
Fig. 2.37 is given by e — xq — x. 



Table 2.2 Rounding and truncation of 2s complement numbers. 



Type 


Quantization 


Error limits 


2’s c. (r) 


xq = QYQ~ l x + 0.5J 


-2/2 < xq - x < 2/2 


2’s c. (t) 


xq = QlQ-'x] 


~Q < XQ - x < 0 



Digital audio signals are coded in the 2’s complement number representation. For 2’s 
complement representation, the range of values from — X max to +2f max is normalized to 
the range —1 to +1 and is represented by the weighted finite sum xq — —bo + b\ • 0.5 + 
bi ■ 0.25 + i >3 • 0.125 + • • • + b w - \ ■ 2 _(u,_1) The variables bo to b w -\ are called bits and 
can take the values 1 or 0. The bit bo is called the MSB (most significant bit) and h w - 1 
is called the LSB (least significant bit). For positive numbers, bo is equal to 0 and for 
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Figure 2.34 Psychoacoustic noise shaping: signal x(n), quantized signal xq{u) and power density 
spectrum of quantized signal. 




Signed 2's Complement 
Unsigned 2's Complement 
Signed Integer 
Unsigned Integer 



negative numbers bo equals 1 . For a 3-bit quantization (see Fig. 2.38), a quantized value can 
be represented by xq = —bo + b\ ■ 0.5 + bi ■ 0.25. The smallest quantization step size is 
0.25. For a positive number 0.75 it follows that 0.75 = —0 + 1 • 0.5 + 1 ■ 0.25. The binary 
coding for 0.75 is Oil. 
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Figure 2.38 Rounding curve and error signal for w = 3 bits. 



Dynamic Range. The dynamic range of a number representation is defined as the ratio of 
maximum to minimum number. For fixed-point representation with 

XQmax = (l -2- (u '- 1) ), (2.132) 

■^min = 2- (u, - 1) , (2.133) 

the dynamic range is given by 

DRf = 20 log 10 ( = 20 log 10 ( 

\ X Q min / V Q ) 

= 20 log 10 (2 W ~ 1 - 1 ) dB . (2.1 34) 

Multiplication and Addition of Fixed-point Numbers. For the multiplication of two 
fixed-point numbers in the range from —1 to +1, the result is always less than 1. For 
the addition of two fixed-point numbers, care must be taken for the result to remain in 
the range from —1 to +1. An addition of 0.6 + 0.7 =1.3 must be carried out in the 
form 0. 5(0. 6 + 0.7) = 0.65. This multiplication with the factor 0.5 or generally 2~ s is 
called scaling. An integer in the range from 1 to, for instance, 8 is chosen for the scaling 
coefficient s. 

Error Model. The quantization process for fixed-point numbers can be approximated as 
an the addition of an error signal e(n) to the signal x(n) (see Fig. 2.39). The error signal is 
a random signal with white power density spectrum. 
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x(n)o- 





Q 









>x Q (n) 



x(n)o- 



T 



■o x Q (n)=x(n)+e(n) 



Figure 2.39 Model of a fixed-point quantizer. 



Signal-to-noise Ratio. The signal-to-noise ratio for a fixed-point quantizer is defined by 

SNR=101og 10 (^, (2.135) 

where cr| is the signal power and cr j is the noise power. 

2.4.2 Floating-point Number Representation 

The representation of a floating-point number is given by 



x e = M g 2 e ° 


(2.136) 


0.5 <M g < 1, 


(2.137) 



where Mq denotes the normalized mantissa and Eq the exponent. The normalized standard 
format (IEEE) is shown in Fig. 2.40 and special cases are given in Table 2.3. The 
mantissa M is implemented with a word-length of wm bits and is in fixed-point number 
representation. The exponent E is implemented with a word-length of we bits and is an 
integer in the range from — 2“’ E_1 + 2 to 2 U,£_1 — 1. For an exponent word-length of 
we = 8 bits, its range of values lies between —126 and +127. The range of values of 
the mantissa lies between 0.5 and 1. This is referred to as the normalized mantissa and is 
responsible for the unique representation of a number. For a fixed-point number in the range 
between 0.5 and 1, it follows that the exponent of the floating-point number representation 
is E — 0. To represent a fixed-point number in the range between 0.25 and 0.5 in floating- 
point form, the range of values of the normalized mantissa M lies between 0.5 and 1, and 
for the exponent it follows that E — —1. As an example, for a fixed-point number 0.75 the 
floating-point number 0.75 • 2° results. The fixed-point number 0.375 is not represented 
as the floating-point number 0.375 • 2°. With the normalized mantissa, the floating-point 
number is expressed as 0.75 • 2 _1 . Owing to normalization, the ambiguity of floating-point 
number representation is avoided. Numbers greater than 1 can be represented. For example, 
1.5 becomes 0.75 • 2 1 in floating-point number representation. 

Figure 2.41 shows the rounding and truncations curves for floating-point representation 
and the absolute error e — XQ—x. The curves for floating-point quantization show 
that for small amplitudes small quantization step sizes occur. In contrast to fixed-point 
representation, the absolute error is dependent on the input signal. 
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Figure 2.40 Floating-point number representation. 



Table 2.3 Special cases of floating-point number representation. 



Type 


Exponent 


Mantissa 


Value 


NAN 


255 


7^0 


Undefined 


Infinity 


255 


0 


(— l) s infinity 


Normal 


1 < e < 254 


Any 


(— lf(0.m)2 e “ 127 


Zero 


0 


0 


(-D* -0 



Rounding w M 3 wg 3 Truncation w M 3 wg 3 





Rounding (abs. error) w M 3 w E 3 Truncation (abs. error) w M 3 w E 3 





Figure 2.41 Rounding and truncation curves for floating-point representation. 
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In the interval 





2 e ° <x <2 Eg+ \ 


(2.138) 


the quantization step is given by 


Q g = 2~ (wm ~ X) 2 Eg . 


(2.139) 


For the relative error 


XQ — X 

e r = 

X 


(2.140) 


of the floating-point representation, a constant upper limit can be stated as 






\e r \ < 2 _(u ’ M_1) . 


(2.141) 



Dynamic Range. With the maximum and minimum numbers given by 

Xemax = (l-2- ( ^- 1) )2 £ emax ! 

x e min = 0.5 2 £cmi “, 

and 

Eg max = 2 U £— 1 - 1, 

E G min — - 2 U ’^ 1 + 2, 



the dynamic range for floating-point representation is given by 



DR g =20 log 10 



( j _ 2 _ ( u 'M-l))2 £ Gmax 
0.5 2 £ Gmin 



= 20 log 10 (l - 2 _(WM_l) )2 £cmax_£Gmin+1 
= 20 log 10 ( 1 - 2- (wm ~' ) )2 2WE - 2 dB. 



(2.142) 

(2.143) 



(2.144) 

(2.145) 



(2.146) 



Multiplication and Addition of Floating-point Numbers. For multiplications with 
floating-point numbers, the exponents of both numbers xq \ = M \ 2 E] and xq 2 — Mi2 El 
are added and the mantissas are multiplied. The resulting exponent Eq — E\ + £2 is 
adjusted so that Mq = M\ Mi lies in the interval 0.5 < Mq < 1. For additions the smaller 
number is denormalized to get the same exponent. Then both mantissa are added and the 
result is normalized. 

Error Model. With the definition of the relative error e r (n) = [xq(ii) — x(n)]/x(n) the 
quantized signal can be written as 



xq{h) — x{n) ■ (1 + e r (n )) = x{n) + x(n) ■ e, (n). (2.147) 



Floating-point quantization can be modeled as an additive error signal e(n ) = x(n ) ■ e, (n ) 
to the signal x(n ) (see Fig. 2.42). 

Signal-to-noise Ratio. Under the assumption that the relative error is independent of the 
input x, the noise power of the floating-point quantizer can be written as 



2 _ 2 
a E ~ a X 




(2.148) 
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x(n)o- 




■O x q ( p ) 




Figure 2.42 Model of a floating-point quantizer. 



For the signal-to-noise-ratio, we can derive 

SNR = 101og 10 ^) = lOlog^-^-) = 101og 10 (-!p). (2.149) 

Equation (2.149) shows that the signal-to-noise ratio is independent of the level of the 
input. It is only dependent on the noise power a j : which, in turn, is only dependent on the 
word-length wm of the mantissa of the floating-point representation. 

2.4.3 Effects on Format Conversion and Algorithms 

First, a comparison of signal-to-noise ratios is made for the fixed-point and floating- 
point number representation. Figure 2.43 shows the signal-to-noise ratio as a function of 
input level for both number representations. The fixed-point word-length is w = 16 bits. 
The word-length of the mantissa in floating-point representation is also wm = 16 bits, 
whereas that of the exponent is we — 4 bits. The signal-to-noise ratio for floating-point 
representation shows that it is independent of input level and varies as a sawtooth curve 
in a 6 dB grid. If the input level is so low that a normalization of the mantissa due to 
finite number representation is not possible, then the signal-to-noise ratio is comparable to 
fixed-point representation. While using the full range, both fixed-point and floating-point 
result in the same signal-to-noise ratio. It can be observed that the signal-to-noise ratio 
for fixed-point representation depends on the input level. This signal-to-noise ratio in the 
digital domain is an exact image of the level-dependent signal-to-noise ratio of an analog 
signal in the analog domain. A floating-point representation cannot improve this signal-to- 
noise ratio. Rather, the floating-point curve is vertically shifted downwards to the value of 
signal-to-noise ratio of an analog signal. 

AD/DA Conversion. Before processing, storing and transmission of audio signals, the 
analog audio signal is converted into a digital signal. The precision of this conversion 
depends on the word-length w of the AD converter. The resulting signal-to-noise ratio 
is 6w dB for uniform PDF inputs. The signal-to-noise ratio in the analog domain depends 
on the level. This linear dependence of signal-to-noise ratio on level is preserved after AD 
conversion with subsequent fixed-point representation. 

Digital Audio Formats. The basis for established digital audio transmission formats is 
provided in the previous section on AD/DA conversion. The digital two-channel AES/EBU 
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W M 16 w E 4 




Figure 2.43 Signal-to-noise ratio for an input level. 



interface [AES92] and 56-channel MADI interface [AES91] both operate with fixed-point 
representation with a word-length of at most 24 bits per channel. 

Storage and Transmission. Besides the established storage media like compact disc and 
DAT which were exclusively developed for audio application, there are storage systems like 
hard discs in computers. These are based on magnetic or magneto-optic principles. The 
systems operate with fixed-point number representation. With regard to the transmission 
of digital audio signals for band-limited transmission channels like satellite broadcasting 
(Digital Satellite Radio, DSR) or terrestrial broadcasting, it is necessary to reduce bit rates. 
For this, a conversion of a block of linearly coded samples is carried out in a so-called 
block floating-point representation in DSR. In the context of DAB, a data reduction of 
linear coded samples is carried out based on psychoacoustic criteria. 

Equalizers. While implementing equalizers with recursive digital filters, the signal-to- 
noise ratio depends on the choice of the recursive filter structure. By a suitable choice 
of a filter structure and methods to spectrally shape the quantization errors, optimal signal- 
to-noise ratios are obtained for a given word-length. The signal-to-noise ratio for fixed- 
point representation depends on the word-length and for floating-point representation on the 
word-length of the mantissa. For filter implementations with fixed-point arithmetic, boost 
filters have to be implemented with a scaling within the filter algorithm. The properties of 
floating-point representation take care of automatic scaling in boost filters. If an insert I/O 
in fixed-point representation follows a boost filter in floating-point representation then the 
same scaling as in fixed-point arithmetic has to be done. 

Dynamic Range Control. Dynamic range control is performed by a simple multiplicative 
weighting of the input signal with a control factor. The latter follows from calculating the 
peak and RMS value (root mean square) of the input signal. The number representation 
of the signal has no influence on the properties of the algorithm. Owing to the normalized 
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mantissa in floating-point representation, some simplifications are produced while deter- 
mining the control factor. 

Mixing/Summation. While mixing signals into a stereo image, only multiplications and 
additions occur. Under the assumption of incoherent signals, an overload reserve can 
be estimated. This implies a reserve of 20/30 dB for 48/96 sources. For fixed-point 
representation the overload reserve is provided by a number of overflow bits in the 
accumulator of a DSP (Digital Signal Processor). The properties of automatic scaling in 
floating-point arithmetic provide for overload reserves. For both number representations, 
the summation signal must be matched with the number representation of the output. While 
dealing with AES/EBU outputs or MADI outputs, both number representations are adjusted 
to fixed-point format. Similarly, within heterogeneous system solutions, it is logical to 
make heterogeneous use of both number representations, though corresponding number 
representations have to be converted. 

Since the signal-to-noise ratio in fixed-point representation depends on the input level, 
a conversion from fixed-point to floating-point representation does not lead to a change of 
signal-to-noise ratio, i.e, the conversion does not improve the signal-to-noise ratio. Further 
signal processing with floating-point or fixed-point arithmetic does not alter the signal-to- 
noise ratio as long as the algorithms are chosen and programmed accordingly. Reconversion 
from floating-point to fixed-point representation again leads to a level-dependent signal-to- 
noise ratio. 

As a consequence, for two-channel DSP systems which operate with AES/EBU or with 
analog inputs and outputs, and which are used for equalization, dynamic range control, 
room simulation etc., the above-mentioned discussion holds. These conclusions are also 
valid for digital mixing consoles for which digital inputs from AD converters or from 
multitrack machines are represented in fixed-point format (AES/EBU or MADI). The 
number representation for inserts and auxiliaries is specific to a system. Digital AES/EBU 
(or MADI) inputs and outputs are realized in fixed-point number representation. 



2.5 Java Applet - Quantization, Dither, and Noise 
Shaping 

This applet shown in Fig. 2.44 demonstrates audio effects resulting from quantization. It is 
designed for a first insight into the perceptual effects of quantizing an audio signal. 

The following functions can be selected on the lower right of the graphical user 
interface: 

• Quantizer 

- word-length w leads to quantization step size Q — 2 U ’ _1 . 

• Dither 



- rect dither - uniform probability density function 

- tri dither - triangular probability density function 
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- high-pass dither - triangular probability density function and high-pass power 
spectral density. 

• Noise shaping 

- first-order H(z ) = z~ l 

- second-order H(z) — —2 z~ l + Z ~ 2 

- psychoacoustic noise shaping. 

You can choose between two predefined audio files from our web server (audiol.wav or 
audio2.wav) or your own local wav file to be processed [Gui05]. 




□ line... Quantization Dither l*L Noise Shaping 0 
□ Wave [eT jlriangi H |First Order 

(c)2005 /WT, Helmut- Schmidt- University Hamburg, Germany Beta Vfersion 



Figure 2.44 Java applet - quantization, dither, and noise shaping. 



2.6 Exercises 

1. Quantization 

1. Consider a 100 Hz sine wave x(n) sampled with fs — 44.1 kHz, N = 1024 samples 
and w — 3 bits (word-length). What is the number of quantization levels? What 
is the quantization step Q when the signal is normalized to — 1 < x(n) < 1. Show 
graphically how quantization is performed. What is the maximum error for this 3-bit 
quantizer? Write Matlab code for quantization with rounding and truncation. 
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2. Derive the mean value, the variance and the peak factor Pp of sequence e(n), if 
the signal has a uniform probability density function in the range —Q/2< e(n) < 
— Q / 2. Derive the signal-to-noise ratio for this case. What will happen if we increase 
our word-length by one bit? 

3. As the input signal level decreases from maximum amplitude to very low amplitudes, 
the error signal becomes more audible. Describe the error calculated above when w 
decreases to 1 bit? Is the classical quantization model still valid? What can be done 
to avoid this distortion? 

4. Write Matlab code for a quantizer with w — 16 bits with rounding and truncation. 

• Plot the nonlinear transfer characteristic and the error signal when the input 
signal covers the range 3 Q < x(n ) < 3 Q. 

• Consider the sine wave x(n) — A sin(27r (/ / fs)n) , n = 0, . . . , N — 1, with 
A—Q , f/fs = 64 /N and N = 1024. Plot the output signal (n — 0,..., 99) 
of a quantizer with rounding and truncation in the time domain and the 
frequency domain. 

• Compute for both quantization types the quantization error and the signal-to- 
noise ratio. 



2. Dither 

1 . What is dither and when do we have to use it? 

2. How do we perform dither and what kinds of dither are there? 

3. How do we obtain a triangular high-pass dither and why do we prefer it to other 
dithers? 

4. Matlab: Generate corresponding dither signals for rectangular, triangular and trian- 
gular high-pass. 

5. Plot the amplitude distribution and the spectrum of the output xq(h) of a quantizer 
for every dither type. 

3. Noise Shaping 

1 . What is noise shaping and when do we do it? 

2. Why is it necessary to dither during noise shaping and how do we do this? 

3. Matlab: The first noise shaper used is without dither and assumes that the transfer 
function in the feedback structure can be first-order H(z) — z~ 1 or second-order 
H(z) — — 2z~ l + z~ 2 ■ Plot the output xq{ti) and the error signal e(n) and its 
spectrum. Show with a plot the shape of the error signal. 

4. The same noise shaper is now used with a dither signal. Is it really necessary to dither 
with noise shaping? Where would you add your dither in the flow graph to achieve 
better results? 
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5. In the feedback structure we now use a psychoacoustic-based noise shaper which 
uses the Wannamaker filter coefficients 



h 3 = [1.623, -0.982, 0.109], 

h 5 = [2.033, -2.165, 1.959, -1.590, 0.6149], 

h 9 = [2.412, -3.370, 3.937, -4.174, 3.353, -2.205, 1.281, -0.569, 0.0847], 



Show with a Matlab plot the shape of the error with this filter. 



References 

[DeK03] D. De Koning, W. Verhelst: On Psychoacoustic Noise Shaping for Audio 
Requantization , Proc. ICASSP-03, Vol. 5, pp. 453-456, April 2003. 

[Ger89] M. A. Gerzon, P. G. Craven: Optimal Noise Shaping and Dither of Digital 
Signals, Proc. 87th AES Convention, Preprint No. 2822, New York, October 
1989. 

[Gui05] M. Guillemard, C. Ruwwe, U. Zolzer: J-DAFx - Digital Audio Effects in Java, 
Proc. 8th Int. Conference on Digital Audio Effects (DAFx-05), pp. 161-166, 
Madrid, 2005. 

[Hel07] C. R. Helmrich, M. Holters, U. Zolzer: Improved Psychoacoustic Noise 
Shaping for Re quantization of High-Resolution Digital Audio, AES 31st 
International Conference, London, June 2007. 

[IS0389] ISO 389-7 : 2005 : Acoustics - Reference Zero for the Calibration of Audiometric 
Equipment - Part 7: Reference Threshold of Hearing Under Free-Field and 
Diffuse-Field Listening Conditions, Geneva, 2005. 

[Lip86] S. P. Lipshitz, J. Vanderkoy: Digital Dither, Proc. 81st AES Convention, 
Preprint No. 2412, Los Angeles, November 1986. 

[Lip92] S. P. Lipshitz, R. A. Wannamaker, J. Vanderkoy: Quantization and Dither: 
A Theoretical Survey, J. Audio Eng. Soc., Vol. 40, No. 5, pp. 355-375, May 
1992. 

[Lip93] S. P. Lipshitz, R. A. Wannamaker, J. Vanderkooy: Dithered Noise Shapers 
and Recursive Digital Filters, Proc 94th AES Convention, Preprint No. 3515, 
Berlin, March 1993. 

[Sha48] C. E. Shannon: A Mathematical Theory of Communication, Bell Systems, 
Techn. J„ pp. 379^123, 623-656, 1948. 

[Sri77] A. B. Sripad, D. L. Snyder: A Necessary and Sufficient Condition for 
Quantization Errors to be Uniform and White, IEEE Trans. ASSP, Vol. 25, 
pp. 442-448, October 1977. 

[Ter79] E. Terhardt: Calculating Virtual Pitch, Hearing Res., Vol. 1, pp. 155-182, 1979. 




62 



Quantization 



[Van89] J. Vanderkooy, S. P. Lipshitz: Digital Dither: Signal Processing with Resolution 
Far Below the Least Significant Bit, Proc. AES Int. Conf. on Audio in Digital 
Times, pp. 87-96, May 1989. 

[Wan92] R. A. Wannamaker: Psychoacoustically Optimal Noise Shaping, J. Audio Eng. 
Soc., Vol. 40, No. 7/8, pp. 611-620, July/August 1992. 

[WanOO] R. A. Wannamaker, S. P. Lipshitz, J. Vanderkooy, J. N. Wright: A Theory of 
Nonsubtractive Dither, IEEE Trans. Signal Processing, Vol. 48, No. 2, pp. 499- 
516,2000. 

[Wid61] B. Widrow: Statistical Analysis of Amplitude-Quantized Sampled-Data 
Systems, Trans. AIEE, Pt. II, Vol. 79, pp. 555-568, January 1961. 




Chapter 3 

AD/DA Conversion 



The conversion of a continuous-time function x(t) (voltage, current) into a sequence of 
numbers x(n) is called analog-to-digital conversion (AD conversion). The reverse process 
is known as digital-to-analog conversion (DA conversion). The time-sampling of a function 
x(t) is described by Shannon’s sampling theorem. This states that a continuous-time signal 
with bandwidth /# can be sampled with a sampling rate fs > 2/# without changing 
the information content in the signal. The original analog signal is reconstructed by 
low-pass filtering with bandwidth fn ■ Besides time-sampling, the nonlinear procedure of 
digitizing the continuous-valued amplitude (quantization) of the sampled signal occurs. In 
Section 3.1 basic concepts of Nyquist sampling, oversampling and delta-sigma modulation 
are presented. In Sections 3.2 and 3.3 principles of AD and DA converter circuits are 
discussed. 



3.1 Methods 

3.1.1 Nyquist Sampling 

The sampling of a signal with sampling rate fs > 2//; is called Nyquist sampling. The 
schematic diagram in Fig. 3.1 shows the procedure. The band-limiting of the input at fs / 2 
is carried out by an analog low-pass filter (Fig. 3.1a). The following sample-and-hold circuit 
samples the band-limited input at a sampling rate fs . The constant amplitude of the time 
function over the sampling period Ts = 1 /fs is converted to a number sequence x(n) by 
a quantizer (Fig. 3.1b). This number sequence is fed to a digital signal processor (DSP) 
which performs signal processing algorithms. The output sequence y(n) is delivered to a 
DA converter which gives a staircase as its output (Fig. 3.1c). Following this, a low-pass 
filter gives the analog output y(t) (Fig. 3. Id). Figure 3.2 demonstrates each step of AD/DA 
conversion in the frequency domain. The individual spectra in Fig. 3.2a-d correspond to 
the outputs in Fig. 3.1a-d. 

After band-limiting (Fig. 3.2a) and sampling, a periodic spectrum with period fs 
of the sampled signal is obtained as shown in Fig. 3.2b. Assuming that consecutive 
quantization errors e(n) are statistically independent of each other, the noise power has 
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Sampling Rate 
fs 




Figure 3.1 Schematic diagram of Nyquist sampling. 






Figure 3.2 Nyquist sampling - interpretation in the frequency domain. 







3. 1 Methods 



65 



a spectral uniform distribution in the frequency domain 0 </</$. The output of the DA 
converter still has a periodic spectrum. However, this is weighted with the sine function 
(sine = sin(.r)/x), of the sample-and-hold circuit (Fig. 3.2c). The zeros of the sine function 
are at multiples of the sampling rate fs . In order to reconstruct the output (Fig. 3. 2d), the 
image spectra are eliminated by an analog low-pass of sufficient stop-band attenuation (see 
Fig. 3.2c). 

The problems of Nyquist sampling lie in the steep band-limiting filter characteristics 
(anti-aliasing filter) of the analog input filter and the analog reconstruction filter (anti- 
imaging filter) of similar filter characteristics and sufficient stop-band attenuation. Further, 
sine distortion due to the sample-and-hold circuit needs to be compensated for. 



3.1.2 Oversampling 

In order to increase the resolution of the conversion process and reduce the complexity 
of analog filters, oversampling techniques are employed. Owing to the spectral uniform 
distribution of quantization error between 0 and fs (see Fig. 3.3a), it is possible to reduce 
the power spectral density in the pass-band 0 < / < fg through oversampling by a factor 
L, i.e. with the new sampling rate Lf s (see Fig. 3.3b). For identical quantization step size 
Q , the shaded areas (quantization error power er|) in Fig. 3.3a and Fig. 3.3b are equal. The 
increase in the signal-to-noise ratio can also be observed in Fig. 3.3. 
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Figure 3.3 Influence of oversampling and delta-sigma technique on power spectral density of 
quantization error and on input sinusoid with frequency f\ . 



It follows that in the pass-band at a sampling rate of fs — 2 fg the power spectral 
density given by 



Q 2 

12 fs 



SeeU) = 



(3.1) 
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leads to the noise power 



Nt> — Op — 2 



l 



f» Q 

See(J) df= * 



2 

12 ' 



(3.2) 



Owing to oversampling by a factor of L, a reduction of the power spectral density given by 

0 2 



See(J') = 



(3.3) 



12 Lf s 

is obtained (see Fig. 3.3b). With fs = 2 f B , the error power in the audio band is given by 

Q 2 Or l 



N 2 B =2f B - 



(3.4) 



12 Lf s 12 L 

The signal-to-noise ratio (with Pp — V3) owing to oversampling can now be expressed as 



SNR = 6.02 • w + 101og 10 (L) dB. (3.5) 

Figure 3.4a shows a schematic diagram of anoversampling AD converter. Owing 
to oversampling, the analog band-limiting low-pass filter can have a wider transition 
bandwidth as shown in Fig. 3.4b. The quantization error power is distributed between 0 and 
the sampling rate Lf s . To reduce the sampling rate, it is necessary to limit the bandwidth 
with a digital low-pass filter (see Fig. 3.4c). After this, the samplingrate is reduced by a 
factor L (see Fig. 3.4d) by taking every Lth output sample of the digital low-pass filter 
[Cro83, Vai93, FliOO] . 

Figure 3.5a shows a schematic diagram of an oversampling DA converter. The sampling 
rate is first increased by a factor of L. For this purpose, L — 1 zeros are introduced between 
two consecutive input values [Cro83, Vai93, FliOO]. The following digital filter eliminates 
all image spectra (Fig. 3.5b) except the base-band spectrum and spectra at multiples of 
Lf s (Fig. 3.5c). It interpolates L — 1 samples between two input samples. The r/; -hit DA 
converter operates at a sampling rate Lf s . Its output is fed to an analog reconstruction filter 
which eliminates the image spectra at multiples of Lf s . 



3.1.3 Delta-sigma Modulation 

Delta-sigma modulation using oversampling is a conversion strategy derived from delta 
modulation. In delta modulation (Fig. 3.6a), the difference between the input x(t) and 
signal x\ (?) is converted into a 1-bit signal y(n) at a very high sampling rate Lf s . The 
sampling rate is higher than the necessary Nyquist rate fs . The quantized signal yin) gives 
the signal x\{t) via an analog integrator. The demodulator consists of an integrator and a 
reconstruction low-pass filter. 

The extension to delta-sigma modulation [Ino63] involves shifting the integrator from 
the demodulator to the input of the modulator (see Fig. 3.6b). With this, it is possible 
to combine the two integrators as a single integrator after addition (see Fig. 3.7a). The 
corresponding signals are shown in Fig. 3.8. 
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Figure 3.4 Oversampling AD converter and sampling rate reduction. 



A time-discrete model of the delta-sigma modulator is given in Fig. 3.7b. The Z- 
transform of the output signal y(n) is given by 



T(z) = 



H(z) 

1 + H(z ) 






1 

1 + H(z) 



E(z ) « X(z) + 



1 

1 + H(z) 



E(z). 



(3.6) 



For a large gain factor of the system H(z), the input signal will not be affected. In contrast, 
the quantization error is shaped by the filter term 1 /[(l + //(-)]. 

Schematic diagrams of delta-sigma AD/DA conversion are shown in Figs 3.9 and 3.10. 
For delta-sigma AD converters, a digital low-pass filter and a downsampler with factor L 
are used to reduce the sampling rate Lf s to fs . The 1-bit input to the digital low-pass filter 
leads to a w-bit output x(n) at a sampling rate fs . The delta-sigma DA converter consists 
of an upsampler with factor L, a digital low-pass filter to eliminate the mirror spectra and 
a delta-sigma modulator followed by an analog reconstruction low-pass filter. In order to 
illustrate noise shaping in delta-sigma modulation in detail, first- and second-order systems 
as well as multistage techniques are investigated in the following sections. 
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Figure 3.5 Oversampling and DA conversion. 



First-order Delta-sigma Modulator 

A time-discrete model of a first-order delta-sigma modulator is shown in Fig. 3.11 
The difference equation for the output y (n ) is given by 

y(n) — x(n — 1) + e(n) — e(n — 1). 

The corresponding Z-transform leads to 

T( z )= z - 1 A(z) + £(z)(T-z- 1 ). 

h e (z) 

The power density spectrum of the error signal e\ ( n ) = e(n) — e(n — 1) is 
SEiEi(ei n ) = SEE(e ' n ) |1 — e _/Q | 2 = 5 , ££(e- ,Q )4 sin“^— 



(3.7) 

(3.8) 



(3.9) 
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b) E(z) 




Figure 3.7 Delta-sigma modulation and time-discrete model. 



where See ( e-* Q ) denotes the power density spectrum of the quantization error e(n). The 
error power in the frequency band [— fg, fs ], with SEE(f) — Q 2 /12L/ S , can be written 
as 



N 2 b = S EE (f ) 2 jf' * 4 sin 2 ^r-^-) df 

~ 6 2 n 2 { 2f B \ 3 
“ 12 3 \Lf s ) • 



(3.10) 

(3.11) 











With fs = 2ffj, we get 
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Figure 3.10 Oversampling delta-sigma DA converter. 



Quantizer 




Figure 3.11 Time-discrete model of a first-order delta-sigma modulator. 



Second-order Delta-sigma Modulator 

For the second-order delta-sigma modulator [Can85], shown in Fig. 3.12, the difference 
equation is expressed as 

y(n) — x{n — 1) + e(n) — 2 e(n — 1) + e(n — 2) (3.13) 



x(z) o — *(7(7) — •{+)- 



Quantizer 



E(z) 

1 

►(+>- 



Y(z) 



Figure 3.12 Time-discrete model of a second-order delta-sigma modulator. 



and the Z-transform is given by 

Y(z) = z~ l X(z ) + E(z ) (1 - 2z -1 + z~ 2 ) ■ (3.14) 

^ Y ^ 

He(z)=(\-z ~ 1 ) 2 

The power density spectrum of the error signal e\ (n) — e(n) — 2e(n — 1) + e(n — 2) can 
be written as 



Se 1 e 1 (c 7Q ) = 5 , ££(e- /Q )| 1 — e 7Q | 4 



= See (e /Q ) 



4 sin - 



= fe(c /Q )4[l-cos(f2)] 2 . 



(3.15) 
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The error power in the frequency band [—//;, fn ] is given by 



N% = S EE Af)2 [ 4[l-cos (£l)Y df 

Jo 

^9i n A( 2 l±i 

~ 12 5 \Lf s ) ' 



Ib 



and with fs = 2 /b we obtain 



Q 2 7t 4 ( 1 



Ni = 



12 5 \L 



Multistage Delta-sigma Modulator 

A multistage delta-sigma modulator (MASH, [Mat87]) is shown in Fig. 3.13. 



E-| (z) 




Figure 3.13 Time-discrete model of a multistage delta-sigma modulator. 
The Z-transforms of the output signals yi (n ) , i = 1, 2, 3, are given by 
Fi(z) = Z(z) + (l- z - 1 )£i(z), 

F 2 (z)= -£i(z) + (1 -z _1 )£2(z), 

T 3 (z) = - E 2 (z) + (1 - z _1 )£ 3 (z). 



(3.16) 

(3.17) 

(3.18) 



(3-19) 

(3.20) 

(3-21) 
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The Z-transform of the output obtained by addition and filtering leads to 
Y(z) = Yi(z) + (1 - z~ X )Y 2 {z) + (1 - z-' fYsiz) 

= X(z) + (1 - z x )i:dz) - (1 - z~ x )Ex{z) 

+ (1 - z~ l ) 2 E 2 {z) - (1 - z~ l ) 2 E 2 (z) + (1 - z~ l ) 3 E 3 (z) 
— X(z) + (1 — z" 1 ) 3 E 3 (z). 
h e (z ) 



The error power in the frequency band [—fs, fs], 



y , 

B 12 7 \Lf s ) 



with fs — 2 fs, gives the total noise power 



, G 2 7T 6 /1 

nI = — — ' 

B 12 7 



(3.22) 



(3.23) 



(3.24) 



The error transfer functions in Fig. 3.14 show the noise shaping for three types of delta- 
sigma modulations as discussed before. The error power is shifted toward higher frequen- 
cies. 
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Figure 3.14 H E (z) = (1 - z~ l ) K with K = 1, 2, 3. 

The improvement of signal-to-noise ratio by pure oversampling and delta-sigma mod- 
ulation (first, second and third order) is shown in Fig. 3.15. For the general case of a Arth- 
order delta-sigma conversion with oversampling factor L one can derive the signal-to-noise 
ratio as 



K l - 

K 2 

K 3 



SNR = 6.02 • w - 101og 10 



2k + 1 



+ (2A: + 1)10 log 10 (L) dB. 



(3.25) 
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Here w denotes the quantizer word-length of the delta-sigma modulator. The signal quanti- 
zation after digital low-pass filtering and downsampling by L can be performed with (3.25) 
according to the relation w — SNR/6. 




Figure 3.15 Improvement of signal-to-noise ratio as a function of oversampling and noise shaping 
(L = 2 - v ). 



Higher-order Delta-sigma Modulator 

A widening of the stop-band for the high-pass transfer function of the quantization error 
is achieved with higher-order delta-sigma modulation [Cha90]. Besides the zeros at z = 1, 
additional zeros are placed on the unit circle. Also, poles are integrated into the trans- 
fer function. A time-discrete model of a higher-order delta-sigma modulator is shown in 
Fig. 3.16. 



Bn 




Figure 3.16 Higher-order delta-sigma modulator. 
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The transfer function in Fig. 3.16 can be written as 



H(z) = 



Ao + A\ j\._i + + ••• 

A 0 (z - 1)* + Ai(z - 1)^“' + • • • + A n 

(z- l) N -Bi(z- l) N ~ l B n 

EILo Mz-l)"-* 

(z- !)*-££=! Bi(z-l) N - r 



The Z-transform of the output is given by 



ru)= jrki Xiz)+ TTm E(z) 

= H x (z)X(z) + H e (Z)E(z). 



(3.26) 

(3.27) 

(3.28) 



The transfer function for the input is 



H x (z) = 



Ei^AKz-1)' 



(z - d* - Eh Bi(z - D N ~‘ + Eh Mz - 1)' 



and the transfer function for the error signal is given by 



He(z ) — 



(z - ir - Ei=i B,(z-i) N -‘ 

(z - l) w - EiU Bi(z - l)^- 1 ' + EiLo - !)' 



(3.29) 



(3.30) 



For Butterworth or Chebyshev filter designs, the frequency responses as shown in Fig. 3.17 
are obtained for the error transfer functions. As a comparison, the frequency responses 
of first-, second- and third-order delta-sigma modulation are shown. The widening of the 
stop-band for Butterworth and Chebyshev filters can be observed from Fig. 3.18. 



Decimation Filter 

Decimation biters for AD conversion and interpolation biters for DA conversion are im- 
plemented with multirate systems [FliOO] . The necessary downsampler and upsampler are 
simple systems. For the former, every nth sample is taken out of the input sequence. For the 
latter, n — 1 zeros are inserted between two input samples. For decimation, band-limiting 
is performed by H(z ) followed by sampling rate reduction by a factor L. This procedure 
can be implemented in stages (see Fig. 3.19). The use of easy-to-implement biter structures 
at high sampling rates, like comb biters with transfer function 

1 1 — z~ L 

tfi(z) = -- r (3.31) 

L 1 - z 1 

(shown in Fig. 3.20), allows simple implementation needing only delay systems and ad- 
ditions. In order to increase the stop-band attenuation, a series of comb biters is used so 
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Figure 3.17 Comparison of different transfer functions of error signal. 




f/Lf s -» 

Figure 3.18 Transfer function of the error signal in stop-band. 



that 



H?(z) = 



11 -z 



L 1 



-L 






M 



(3.32) 



is obtained. 

Besides additions at high sampling rates, complexity can be reduced further. Owing 
to sampling rate reduction by a factor of L , the numerator (1 — z~ L ) can be moved so 
that it is placed after the downsampler (see Fig. 3.21). For a series of comb filters, the 






3. 1 Methods 



77 




Figure 3.19 Several stages of sampling rate reduction. 




Figure 3.20 Signal flow diagram of a comb filter. 

structure in Fig. 3.22 results. M simple recursive accumulators have to be performed at 
the high sampling rate Lf s . After this, downsampling by a factor L is carried out. The M 
nonrecursive systems are calculated with the output sampling rate f$. 




Figure 3.21 Comb filter for sampling rate reduction. 



1 M 1 M 




Figure 3.22 Series of comb filters for sampling rate reduction. 

Figure 3.23a shows the frequency responses of a series of comb filters (L = 16). Fig- 
ure 3.23b shows the resulting frequency response for the quantization error of a third-order 
delta-sigma modulator connected in series with a comb filter H\(z). The system delay 
owing to filtering and sampling rate reduction is given by 



N — 1 1 




(3.33) 
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f/Lfc 



f/Lfc 



.. r i i 

Figure 3.23 (a) Transfer function Hf 1 (z) = I 7.-1 j with M = 1 ... 4. (b) Third-order delta- 
sigma modulation and in series with H^(z). 



Example: Delay time of conversion process (latency time) 

1 . Nyquist conversion 

fs = 48 kHz 

1 

to — — = 20.83 ps. 

fs 

2. Delta-sigma modulation with single-stage downsampling 

L = 64 
fs = 48 kHz 
N = 4096 
t D — 665 ps. 

3. Delta-sigma modulation with two-stage downsampling 

L = 64 
fs = 48 kHz 

Li = 16 

L 2 =4 
Ni =61 
N 2 = 255 
t Dl — 9.76 ps 
t Dl — 662 ps. 
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3.2 AD Converters 

The choice of an AD converter for a certain application is influenced by a number of 
factors. It mainly depends on the necessary resolution for a given conversion time. Both 
of these depend upon each other and are decisively influenced by the architecture of the 
AD converter. For this reason, the specifications of an AD converter are first discussed. 
This is followed by circuit principles which influence the mutual dependence of resolution 
and conversion time. 

3.2.1 Specifications 

In the following, the most important specifications for AD conversion are presented. 

Resolution. The resolution for a given word-length w of an AD converter determines the 
smallest amplitude 

x mi n=<2 = x max 2- (u, - 1) , (3.34) 

which is equal to the quantization step Q. 

Conversion Time. The minimum sampling period 75=1 / fs between two samples is 
called the conversion time. 

Sample-and-hold Circuit. Before quantization, the time-continuous function is sampled 
with the help of a sample-and-hold circuit, as shown in Fig. 3.24a. 

The sampling period Ts is divided into the sampling time ts in which the output voltage 
U 2 follows the input voltage U \ , and the hold time t h ■ During the hold time the output 
voltage U 2 is constant and is converted into a binary word by quantization. 



a) 



1 I 

1 



S/H 



|U 2 

1 



b) 




_f 



H T S 



Figure 3.24 (a) Sample-and-hold circuit, (b) Input and output with clock signal. (t$ — sampling time, 
tfj = hold time, t ad = aperture delay.) 



Aperture Delay. The time tAD elapsed between start of hold and actual hold mode (see 
Fig. 3.24b) is called the aperture delay. 
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Aperture Jitter. The variation in aperture delay from sample to sample is called the aper- 
ture jitter tADJ- The influence of aperture jitter limits the useful bandwidth of the sampled 
signal. This is because at high frequency a deterioration of the signal-to-noise ratio occurs. 
Assuming a Gaussian PDF aperture jitter, the signal-to-noise ratio owing to aperture jitter 
as a function of frequency / can be written as 

SNRy = -20 log 10 (27r/t AD y) dB. (3.35) 

Offset Error and Gain Error. The offset and gain errors of an AD converter are shown in 
Fig. 3.25. The offset error results in a horizontal displacement of the real curve compared 
with the dashed ideal curve of an AD converter. The gain error is expressed as the deviation 
from the ideal gradient of the curve. 




Figure 3.25 Offset error and gain error. 



Differential Nonlinearity. The differential nonlinearity 

Ax/ Q 

DNL = — - 1 LSB (3.36) 

A xq 

describes the error of the step size of a certain code word in LSB units. For ideal quan- 
tization, the increase Ax in the input voltage up to the next output code xq is equal to 
the quantization step Q (see Fig. 3.26). The difference of two consecutive output codes is 
denoted by A.vg. When the output code changes from 010 to 01 1, the step size is 1.5 LSB 
and therefore the differential nonlinearity DNL = 0.5 LSB. The step size between the 
codes 011 and 101 is 0 LSB and the code 200 is missing. The differential nonlinearity 
is DNL = — 1 LSB. 

Integral Nonlinearity. The integral nonlinearity (INL) describes the error between the 
quantized and the ideal continuous value. This error is given in LSB units. It arises owing 
to the accumulated error of the step size. This (see Fig. 3.27) changes itself continuously 
from one output code to another. 

Monotonicity. The progressive increase in quantizer output code for a continuously in- 
creasing input voltage and progressive decrease in quantizer output code for a continuously 
decreasing input voltage is called monotonicity. An example of non-monotonic behavior is 
shown in Fig. 3.28 where one output code does not occur. 
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Figure 3.26 Differential nonlinearity. 
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Figure 3.27 Integral nonlinearity. 
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Figure 3.28 Monotonicity. 



Total Harmonic Distortion. The harmonic distortion is calculated for an AD converter at 
full range with a sinusoid (Xi = 0 dB) of given frequency. The selective measurement of 
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harmonics of the second to the ninth order are used to compute 

oo 

THD = 20 log £[10<-*«/ 2 °)]2 dB (3.37) 

\ n= 2 

■ 100%, (3.38) 

where X n are the harmonics in dB. 

THD+N: Total Harmonic Distortion plus Noise. For the calculation of harmonic dis- 
tortion plus noise, the test signal is suppressed by a stop-band filter. The measurement of 
harmonic distortion plus noise is performed by measuring the remaining broad-band noise 
signal which consists of integral and differential nonlinearity, missing codes, aperture jitter, 
analog noise and quantization error. 

3.2.2 Parallel Converter 

Parallel Converter. A direct method for AD conversion is called parallel conversion (flash 
converter). In parallel converters, the output voltage of the sample-and-hold circuit is com- 
pared with a reference voltage Ur with the help of 2 W — 1 comparators (see Fig. 3.29). 
The sample-and-hold circuit is controlled with sampling rate fs so that, during the hold 
time tj-j, a constant voltage at the output of the sample-and-hold circuit is available. The 
outputs of the comparators are fed at sampling clock rate into a ( 2 W — l)-bit register and 
converted by a coding logic to a if -bit data word. This is fed at sampling clock rate to an 
output register. The sampling rates that can be achieved lie between 1 and 500 MHz for a 
resolution of up to 10 bits. Owing to the large number of comparators, the technique is not 
feasible for high precision. 

Rate 





Figure 3.29 Parallel converter. 
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Half-flash Converter. In half-flash AD converters (Fig. 3.30), two w;-bit parallel converters 
are used in order to convert two different ranges. The first m-bit AD converter gives a 
digital output word which is converted into an analog voltage using an m-bit DA converter. 
This voltage is now subtracted from the output voltage of the sample-and-hold circuit. 
The difference voltage is digitized with a second m-bit AD converter. The rough and fine 
quantization leads to a io-bit data word with a subsequent logic. 




Figure 3.30 Half-flash AD converter. 



Subranging Converter. A combination of direct conversion and sequential procedure is 
carried out for subranging AD converters (see Fig. 3.31). In contrast to the half-flash 
converter, only one parallel converter is required. The switches .S) and Si take the values 
of 0 and 1 . First the output voltage of a sample-and-hold circuit and then the difference 
voltage amplified by a factor 2 m is fed to an m-bit AD converter. The difference voltage is 
formed with the help of the output voltage of an m -bit DA converter and the output voltage 
of the sample-and-hold circuit. The conversion rates lie between 100 kHz and 40 MHz 
where a resolution of up to 16 bits is achieved. 




Figure 3.31 Subranging AD converter. 



3.2.3 Successive Approximation 

AD converters with successive approximation consist of the functional modules shown in 
Fig. 3.32. The analog voltage is converted into a w-bit word within w cycles. The converter 
consists of a comparator, a if -bit DA converter and logic for controlling the successive 
approximation. 
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Sampling Rate 
fs 



Figure 3.32 AD converter with successive approximation. 



The conversion process is explained with the help of Fig. 3.33. First, it is checked 
whether a positive or negative voltage is present at the comparator. If it is positive, the out- 
put +0.5Ur is fed to a DA converter to check whether the output voltage of the comparator 
is greater or less than +0.5 Ur. Then, the output of (+0.5 ± 0.25) Ur is fed to the DA 
comparator. The output of the comparator is then evaluated. This procedure is performed 
w times and leads to a w-bit word. 



+ 0.5 u R 




Figure 3.33 Successive approximation. 



For a resolution of 12 bits, sampling rates of up to 1 MHz can be achieved. Higher 
resolutions of more than 16 bits are possible at a lower sampling rates. 



3.2.4 Counter Methods 

In contrast to the conversion techniques of the previous sections for high conversion rates, 
the following techniques are used for sampling rates smaller than 50 kHz. 

Forward-backward Counter. A technique which operates like successive approximation 
is the forward-backward counter shown in Fig. 3.34. A logic controls a clocked forward- 
backward counter whose output data word provides an analog output voltage via a w-bit 
DA converter. The difference signal between this voltage and the output voltage of the 
sample-and-hold circuit determines the direction of counting. The counter stops when the 
corresponding output voltage of the DA converter is equal to the output voltage of the 
sample-and-hold circuit. 

Single-slope Counter. The single-slope AD converter shown in Fig. 3.35 compares the 
output voltage of the sample-and-hold circuit with a voltage of a sawtooth generator. The 
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Figure 3.34 AD converter with forward-backward counter. 



sawtooth generator is started every sampling period. As long as the input voltage is greater 
than the sawtooth voltage, the clock impulses are counted. The counter value corresponds 
to the digital value of the input voltage. 




t t t 




Figure 3.35 Single-slope AD converter. 



Dual-slope Converter. A dual-slope AD converter is shown in Fig. 3.36. In the first phase 
in which a switch .Si is closed for a counter period t \ , the output voltage of the sample- 
and-hold circuit is fed to an integrator of time-constant r. During the second phase, the 
switch 53 is closed and the switch .S] is opened. The reference voltage is switched to the 
integrator and the time to reach a threshold is determined by counting the clock impulses by 
a counter. Figure 3.36 demonstrates this for three different voltages U 2 . The slope during 
time t\ is proportional to the output voltage U 2 of the sample-and-hold circuit, whereas the 
slope is constant when the reference voltage Ur is connected to the integrator. The ratio 
U 2 /UR = d/D leads to the digital output word. 

3.2.5 Delta-sigma AD Converter 

The delta-sigma AD converter in Fig. 3.37 requires no sample-and-hold circuit owing to its 
high conversion rate. The analog band-limiting low-pass filter and the digital low-pass filter 
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Figure 3.36 Dual-slope AD converter. 



for downsampling to a sampling rate fs are usually on the same circuit. The linear phase 
nonrecursive digital low-pass filter in Fig. 3.37 has a 1-bit input signal and leads to a j/’ - hit 
output signal owing to the N filter coefficients /to, hi, , h^-i which are implemented 
with a word-length of w bits. The output signal of the filter results from the summation of 
the filter coefficients (0 or 1) of the nonrecursive low-pass filter. The downsampling by a 
factor L is performed by taking every Lth sample out of the filter and writing to the output 
register. In order to reduce the number of operations the filtering and downsampling can be 
performed only every Lth input sample. 

Applications of delta-sigma AD converters are found at sampling rates of up to 100 kHz 
with a resolution of up to 24 bits. 



3.3 DA Converters 

Circuit principles for DA converters are mainly based on direct conversion techniques of 
the input code. Achievable sampling rates are accordingly high. 



3.3.1 Specifications 

The definitions of resolution, total harmonic distortion (THD) and total harmonic distortion 
plus noise (THD+N) correspond to those for AD converters. Further specifications are 
discussed in the following. 
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Lfs fs 




w bits 



Figure 3.37 Delta-sigma AD converter. 



Settling Time. The time interval between transferring a binary word and achieving the 
analog output value within a specific error range is called the settling time tsE- The set- 
tling time determines the maximum conversion frequency /'y max = 1 /tsE- Within this time, 
glitches between consecutive amplitude values can occur (see Fig. 3.38). With the help of 
a sample-and-hold circuit (deglitcher), the output voltage of the DA converter is sampled 
after the settling time and held. 
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Figure 3.38 Settling time and sample-and-hold function. 



Offset and Gain Error. The offset and gain errors of a DA converter are shown in Fig. 3.39. 

Differential Nonlinearity. The differential nonlinearity for DA converters describes the 
step size error of a code word in LSB units. For ideal quantization, the increase Ax of 
the output voltage until the next code word corresponding to the output voltage is equal to 
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Input Code 



Input Code 



Figure 3.39 Offset and gain error. 




Input Code 



Figure 3.40 Differential nonlinearity. 



the quantization step size Q (see Fig. 3.40). The difference between two consecutive input 
codes is termed A xq. Differential nonlinearity is given by 

A.v / Q 

DNL = — - 1 LSB. (3.39) 

A xq 

For the code steps from 001 to 010 as shown in Fig. 3.40, the step size is 1.5 LSB, and 
therefore the differential nonlinearity DNL = 0.5 LSB. The step size between the codes 010 
and 100 is 0.75 LSB and DNL = —0.25. The step size for the code change from 011 to 100 
is 0 LSB (DNL = — 1 LSB). 

Integral Nonlinearity. The integral nonlinearity describes the maximum deviation of the 
output voltage of a real DA converter from the ideal straight line (see Fig. 3.41). 

Monotonicity. The continuous increase in the output voltage with increasing input code 
and the continuous decrease in the output voltage with decreasing input code is called 
monotonicity. A non- monotonic behavior is presented in Fig. 3.42. 
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Input Code 



Figure 3.41 Integral nonlinearity. 




Input Code 



Figure 3.42 Monotonicity. 



3.3.2 Switched Voltage and Current Sources 

Switched Voltage Sources. The DA conversion with switched voltage sources shown in 
Fig. 3.43a is carried out with a reference voltage connected to a resistor network. The 
resistor network consists of 2 W resistors of equal resistance and is switched in stages to a 
binary-controlled decoder so that, at the output, a voltage Ui is present corresponding to 
the input code. Figure 3.43b shows the decoder for a 3-bit input code 101. 

Switched Current Sources. DA conversion with 2 W switched current sources is shown 
in Fig. 3.44. The decoder switches the corresponding number of current sources onto the 
current- voltage converter. The advantage of both techniques is the monotonicity which is 
guaranteed for ideal switches but also for slightly deviating resistances. The large number 
of resistors in switched current sources or the large number of switched current sources 
causes problems for long word-lengths. The techniques are used in combination with other 
methods for DA conversion of higher significant bits. 

3.3.3 Weighted Resistors and Capacitors 

A reduction in the number of identical resistors or current sources is achieved with the 
following method. 
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Figure 3.44 Switched current sources. 



Weighted Resistors. DA conversion with w switched current sources which are weighted 
according to 

II = 2/2 = 4/3 = • • • = 2 w ~ l I w (3.40) 

is shown in Fig. 3.45. The output voltage is 

U 2 — — R ■ I = —R ■ (biia 0 + b 2 I 2 2 1 + b 3 h 2 2 + • • • + b w I w 2 w ~ l ), (3.41) 



where b n takes values 0 or 1 . The implementation of DA conversion with switched current 
sources is carried out with weighted resistors as shown in Fig. 3.46. The output voltage is 




(3.42) 

(3.43) 



Weighted Capacitors. DA conversion with weighted capacitors is shown in Fig. 3.47. 
During the first phase (switch position 1 in Fig. 3.47) all capacitors are discharged. During 
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Figure 3.45 Weighted current sources. 
MSB LSB 




JL 

Figure 3.46 DA conversion with weighted resistors. 



the second phase, all capacitors that belong to 1 bit are connected to a reference voltage. 
Those capacitors belonging to 0 bits are connected to ground. The charge on the capacitors 
C fl that are connected with the reference voltage can be set equal to the total charge on all 
capacitors C g , which leads to 

/ biC b 3 C b w C\ 

Ur C a = U R l hC + ^- + ^ r + ... + ^ J = C g U 2 = 2 0 / 2 . (3.44) 

Hence, the output voltage is 

U 2 = ibrt~ l + b 2 2~ 2 + b 3 2~ 3 + • • • + b w 2~ w )U R . (3.45) 




Figure 3.47 DA conversion with weighted capacitors. 
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3.3.4 R-2R Resistor Networks 

The DA conversion with switched current sources can also be carried out with an R-2R 
resistor network as shown in Fig. 3.48. In contrast to the method with weighted resistors, 
the ratio of the smallest to largest resistor is reduced to 2:1. 




Figure 3.48 Switched current sources with R-2R resistor network. 



The weighting of currents is achieved by a current division at every junction. Looking 
right from every junction, a resulting resistance R + 2 R || 2 R = 2 R is found which is equal 
to the resistance in the vertical direction downwards from the junction. For the current 
from junction 1 it follows that I\ — Ur/2R, and for the current from junction 2 h — h/2. 
Hence, a binary weighting of the w currents is given by 

l\—2h — 4/3 = • • • = 2 W ~ 1 I w . (3.46) 



The output voltage U 2 can be written as 



U 2 = - RI = -R 
= - u R {b x i- 1 



f b\ b 2 
\2R + 4 R 
+ b 2 2~ 2 + 




b w 

2 w ~ l R 



Ur 



b 3 2~ 3 + ■ ■ ■ + b w 2~ w ). 



(3.47) 

(3.48) 



3.3.5 Delta-sigma DA Converter 

A delta-sigma DA converter is shown in Fig. 3.49. The converter is provided with j/’ - hit 
data words by an input register with the sampling rate fs . This is followed by a sample rate 
conversion up to Lf s by upsampling and a digital low-pass filter. A delta-sigma modulator 
converts the w-bit input signal into a 1-bit output signal. The delta-sigma modulator corre- 
sponds to the model in Section 3.1.3. Subsequently, the DA conversion of the 1-bit signal 
is performed followed by the reconstruction of the time-continuous signal by an analog 
low-pass filter. 



3.4 Java Applet - Oversampling and Quantization 

The applet shown in Fig. 3.50 demonstrates the influence of oversampling on power spec- 
tral density of the quantization error. For a given quantization word-length the noise level 
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Sampling Rate Sampling Rate 

fs Lfs 




Figure 3.49 Delta-sigma DA converter. 



can be reduced by changing the oversampling factor. The graphical interface of this applet 
presents several quantization and oversampling values; these can be used to experiment the 
noise reduction level. An additional FFT spectral representation provides a visualization of 
this audio effect. 
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Figure 3.50 Java applet - oversampling and quantization. 



The following functions can be selected on the lower right of the graphical user inter- 
face: 






94 AD/DA Conversion 

• Quantizer 

- word-length w leads to quantization step size Q — 2 w ~ l . 

• Dither 

- rect dither - uniform probability density function 

- tri dither - triangular probability density function 

- high-pass dither - triangular probability density function and high-pass power 
spectral density. 

• Noise shaping 

- first-order H(z) — z _1 ■ 

• Oversampling factor 

- Factors from 4 up to 64 can be tested depending on the CPU performance of 
your machine. 

You can choose between two predefined audio files from our web server ( audiol.wav or 
audio2.wav) or your own local wav file to be processed [Gui05]. 



3.5 Exercises 

1. Oversampling 

1. How do we define the power spectral density Sxxie'®') of a signal x(n)l 

2. What is the relationship between signal power (variance) and power spectral 
density Sxx(e' n )2 

3. Why do we need to oversample a time-domain signal? 

4. Explain why an oversampled PCM A/D converter has lower quantization noise power 
in the base-band than a Nyquist rate sampled PCM A/D converter. 

5. How do we perform oversampling by a factor of L in the time domain? 

6. Explain the frequency-domain interpretation of the oversampling operation. 

7. What is the pass-band and stop-band frequency of the analog anti-aliasing filter? 

8. What is the pass-band and stop-band frequency of the digital anti-aliasing filter 
before downsampling? 

9. How is the downsampling operation performed (time-domain and frequency-domain 
explanation)? 
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2. Delta-sigma Conversion 

1 . Why can we apply noise shaping in an oversampled AD converter? 

2. Show how the delta-sigma converter (DSC) has a lower quantization error power in 
the base-band than an oversampled PCM A/D converter. 

3. How do the power spectral density and variance change in relation to the order of the 
DSC? 

4. How is noise shaping achieved in an oversampled delta-sigma AD converter? 

5. Show the noise shaping effect (with Matlab plots) of a delta sigma modulator and 
how the improvement of the signal-to-noise for pure oversampling and delta-sigma 
modulator is achieved. 

6. Using the previous Matlab plots, specify which order and oversampling factor L will 
be needed for a 1-bit delta-sigma converter for SNR = 100 dB. 

7. What is the difference between the delta-sigma modulator in the delta-sigma AD 
converter and the delta-sigma DA converter? 

8. How do we achieve a w-bit signal representation at Nyquist sampling frequency from 
an oversampled 1-bit signal? 

9. Why do we need to oversample a w-bit signal for a delta-sigma DA converter? 
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Chapter 4 

Audio Processing Systems 



Digital signal processors (DSPs) are used for discrete-time signal processing. Their ar- 
chitecture and instruction set is specially designed for real-time processing of signal pro- 
cessing algorithms. DSPs of different manufacturers and their use in practical circuits 
will be discussed. The restriction to the architecture and practical circuits will provide the 
user with the criteria necessary for selecting a DSP for a particular application. From the 
architectural features of different DSPs, the advantages of a certain processor with respect 
to fast execution of algorithms (digital filter, adaptive filter, FFT, etc.) automatically result. 
The programming methods and application programs are not dealt with here, because the 
DSP user guides from different manufacturers provide adequate information in the form of 
sample programs for a variety of signal processing algorithms. 

After comparing DSPs with other microcomputers, the following topics will be dis- 
cussed in the forthcoming sections: 

• fixed-point DSPs; 

• floating-point DSPs; 

• development tools; 

• single-processor systems (peripherals, control principles); 

• multi-processor systems (coupling principles, control principles). 

The internal design of microcomputers is mainly based on two architectures; the von Neu- 
mann architecture which uses a shared instruction/data bus; and the Haiyard architecture 
which has separate buses for instructions and data. Processors based on these architectures 
are CISCs, RISCs and DSPs. Their characteristics are given in Table 4.1. Besides the 
internal properties listed in the table, DSPs have special on-chip peripherals which are 
suited to signal processing applications. The fast response to external interrupts enables 
their use in real-time operating systems. 
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Table 4.1 CISC, RISC and DSP. 

Type Characteristics 

CISC Complex Instruction Set Computer 

• von Neumann architecture 

• assembler programming 

• large number of instructions 

• computer families 

• compilers 

• application: universal microcomputers 

RISC Reduced Instruction Set Computer 

• von Neumann architecture/Harvard architecture 

• number of instructions <50 

• number of address modes <4, instruction formats <4 

• hard-wired instruction (no microprogramming) 

• processing most of the instructions in one cycle 

• optimizing compilers for high-level programming languages 

• application: workstations 

DSP Digital Signal Processor 

• Harvard architecture 

• several internal data buses 

• assembler programming 

• parallel processing of several instructions in one cycle 

• optimizing compilers for high-level programming languages 

• real-time operating systems 

• application: real-time signal processing 



4.1 Digital Signal Processors 

4.1.1 Fixed-point DSPs 

The discrete-time and discrete-amplitude output of an AD converter is usually represented 
in 2’s complement format. The processing of these number sequences is carried out with 
fixed-point or floating-point arithmetic. The output of a processed signal is again in 2’s 
complement format and is fed to a DA converter. The signed fractional representation (2’s 
complement) is the common method for algorithms in fixed-point number representation. 
For address generation and modulo operations unsigned integers are used. Figure 4. 1 shows 
a schematic diagram of a typical fixed-point DSP. The main building blocks are program 
controller, arithmetic logic unit (ALU) with a multiplier-accumulator (MAC), program and 
data memory and interfaces to external memory and peripherals. All blocks are connected 
with each other by an internal bus system. The internal bus system has separate instruction 
and data buses. The data bus itself can consist of more than one parallel bus enabling 
it, for instance, to transmit both operands of a multiplication instruction to the MAC in 
parallel. The internal memory consists of instruction and data RAM and additional ROM 
memory. This internal memory permits fast execution of internal instructions and data 
transfer. For increasing memory space, address/control and data buses are connected to 
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external memories like EPROM, ROM and RAM. The connection of the external bus 
system to the internal bus architecture has great influence on efficient execution of external 
instructions as well as on processing external data. In order to connect serially operating 
AD/DA converters, special serial interfaces with high transmission rates are offered by 
several DSPs. Moreover, some processors support direct connection to an RS232 interface. 
The control from a microprocessor can be achieved via a host interface with a word-length 
of 8 bits. 



Address 

Data 

Control 




Host Interface 
Serial AD/DA Interface 
Serial Interface 



Figure 4.1 Schematic diagram of a fixed-point DSP. 

An overview of fixed-point DSPs with respect to word-length and cycle time is shown 
in Table 4.2. Basically, the precision of the arithmetic can be doubled if quantization affects 
the stability and numeric precision of the applied algorithm. The cycle time in connection 
with processing time (in processor cycles) of a combined multiplication and accumulation 
command gives insight into the computing power of a particular processor type. The cycle 
time directly results from the maximum clock frequency. The instruction processing time 
depends mainly on the internal instruction and data structure as well as on the external 
memory connections of the processor. The fast access to external instruction and data 
memories is of special significance in complex algorithms and in processing huge data 
loads. Further attention has to be paid to the linking of serial data connections with AD/DA 
converters and the control by a host computer over a special host interface. Complex 
interface circuits could therefore be avoided. For stand-alone solutions, program loading 
from a simple external EPROM can also be done. 

For signal processing algorithms, the following software commands are necessary: 

1 . MAC (multiply and accumulate) —*■ combined multiplication and addition command; 

2. simultaneous transfer of both operands for multiplication to the MAC (parallel move); 

3. bit-reversed addressing (for FFT); 

4. modulo addressing (for windowing and filtering). 

Different signal processors have different processing times for FFT implementations. 
The latest signal processors with improved architecture have shorter processing times. The 
instruction cycles for the combined multiplication and accumulation command (applica- 
tion: windowing, filtering) are approximately equal for different processors, but processing 
cycles for external operands have to be considered. 
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Table 4.2 Fixed-point DSPs (Analog Devices AD, Texas Instruments TI, Motorola MOT, Agere 
Systems AG). 



Type 


Word-length 


Cycle time 
MHz/ns 


Computation power 
MM ACS 


ADSP-BF533 


16 


756/1.3 


1512 


ADSP-BF561 


16 


756/1.3 


3024 


ADSP-T201 


32 


600/1.67 


4800 


TI-TMS320C6414 


16 


1000/1 


4000 


MOT-DSP56309 


24 


100/10 


100 


MOT-DSP56L307 


24 


160/6.3 


160 


AG-DSP16410 x 2 


16 


195/5.1 


780 



4.1.2 Floating-point DSPs 



Figure 4.2 shows the block diagram of a typical floating-point DSP. The main characteris- 
tics of the different architectures are the dual-port principle (Motorola, Texas Instruments) 
and the external Harvard architecture (Analog Devices, NEC). Floating-point DSPs in- 
ternally have multiple bus systems in order to accelerate data transfer to the processing 
unit. An on-chip DMA controller and cache memory support higher data transfer rates. An 
overview of floating-point DSPs is shown in Table 4.3. Besides the standardized floating- 
point representation IEEE-754, there are also manufacturer-dependent number representa- 
tions. 
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Figure 4.2 Block diagram of a floating-point digital signal processor. 
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Table 4.3 Floating-point DSPs. 



Type 


Word-length 


Cycle time 
MHz/ns 


Computation power 
MFLOPS 


ADSP 21364 


32 


300/3.3 


1800 


ADSP 21267 


32 


150/6.6 


900 


ADSP-21 161N 


32 


100/10 


600 


TI-TMS320C67 1 1 


32 


200/5 


1200 



4.2 Digital Audio Interfaces 

For transferring digital audio signals, two transmission standards have been established 
by the AES (Audio Engineering Society) and the EBU (European Broadcasting Union), re- 
spectively. These standards are for two-channel transmission [AES92] and for multichannel 
transmission of up to 56 audio signals. 
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Figure 4.3 Two-channel format. 
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Figure 4.4 Two-channel format (subframe). 



4.2.1 Two-channel AES/EBU Interface 

For the two-channel AES/EBU interface, professional and consumer modes are defined. 
The outer frame is identical for both modes and is shown in Fig. 4.3. For a sampling period 
a frame is defined so that it consists of two subframes, for channel 1 with preamble X, and 
for channel 2 with preamble Y. A total of 192 frames form a block, and the block start is 
characterized by a special preamble Z. The bit allocation of a subframe consists of 32 bits 
as in Fig. 4.4. The preamble consists of 4 bits (bit 0, . . . , 3) and the audio data of up to 
24 bits (bit 4, . . . , 27). The last four bits of the subframe characterize Validity (validity 
of data word or error), User Status (usable bit), Channel Status (from 192 bits/block = 24 
bytes coded status information for the channel) and Parity (even parity). The transmission 
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of the serial data bits is carried out with a biphase code. This is done with the help of an 
XOR relationship between clock (of double bit rate) and the serial data bits (Fig. 4.5). 
At the receiver, clock retrieval is achieved by detecting the preamble (X= 11100010, 
Y = 1 1 100100, Z = 1 1 101000) as it violates the coding rule (see Fig. 4.6). The meaning 
of the 24 bytes for channel status information is summarized in Table 4.4. An exact bit 
allocation of the first three important bytes of this channel status information is presented 
in Fig. 4.7. In the individual fields of byte 0, preemphasis and sampling rate are specified 
besides professional/consumer modes and the characterization of data/audio (see Tables 4.5 
and 4.6). Byte 1 determines the channel mode (Table 4.7). The consumer format (often la- 
beled SPDIF = Sony/Philips Digital Interface Format) differs from the professional format 
in the definition of the channel status information and the technical specifications for inputs 
and outputs. The bit allocation for the first four bits of the channel information is shown 
in Fig. 4.8. For consumer applications, two-wired leads with RCA connectors are used. 
The inputs and outputs are asymmetrical. Also, optical connectors exist. For professional 
use, shielded two-wired leads with XLR connectors and symmetrical inputs and outputs 
(professional format) are used. Table 4.8 shows the electrical specifications for professional 
AES/EBU interfaces. 
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Channel coding 
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Figure 4.5 Channel coding. 
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Figure 4.6 Preamble X. 



4.2.2 MADI Interface 

For connecting an audio processing system at different locations, a MADI interface 
(Multichannel Audio Digital interface) is used. A system link by MADI is presented in 
Fig. 4.9. Analog/digital I/O systems consisting of AD/DA converters, AES/EBU interfaces 
(AES) and sampling rate converters (SRC) are connected to digital distribution systems 
with bi-directional MADI links. The actual audio signal processing is performed in special 
DSP systems which are connected to the digital distribution systems by MADI links. The 
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Table 4.4 Channel status bytes. 



Byte 


Description 


0 


Emphasis, sampling rate 


1 


Channel use 


2 


Sample length 


3 


Vector for byte 1 


4 


Reference bits 


5 


Reserved 


6-9 


4 bytes of ASCII origin 


10-13 


4 bytes of ASCII destination 


14-17 


4 bytes of local address 


18-21 


Time code 


22 


Flags 


23 


CRC 



Byte 0 






Emphasis 




Sampling rate 




0 


1 


2 3 4 


5 


6 7 



i i i 

Professional/ Data/ Unlocked/locked 

Consumer Audio 



Channel mode 




Not used 




0 12 3 


4 


5 6 


7 



Byte 2 


Sample length 


Encoded length 


0 




0 1 2 


3 4 5 


6 7 



Figure 4.7 Bytes 0-2 of channel status information. 



Table 4.5 Emphasis field. 

0 None indicated, override enabled 

4 None indicated, override disabled 

6 50/15 ps emphasis 

7 CCITTJ. 17 emphasis 



Table 4.6 Sampling rate field. 



0 None indicated (48 kHz default) 

1 48 kHz 

2 44.1kHz 

3 32 kHz 
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Table 4.7 Channel mode. 

0 None indicated (2 channel default) 

1 Two channel 

2 Monaural 

3 Primary/secondary (A = primary. B = secondary) 

4 Stereo (A = left, B = right) 

7 Vector to byte 3 















Mode 


0 


1 


2 


3 


4 


5 


6 7 



Consumer (=0) Data/ 
Audio 



Quad/Stereo 

- Not used 

- Preemphasis/None 

- Copy Permitted/Copyright 



Byte 1 


category 




0 








0 1 2 


3 


4 5 


6 


7 



0 General purpose I 

2 PCM Generation 



3 ADC 

4 CD 
6 DAT 



Byte 2 



source number 


channel number 


0 12 3 


4 5 6 7 



Byte 3 


Sampling 

rate 


0 


Accuracy 


Reserved 




0 1 


2 3 


4 5 


6 7 



0 44.1 kHz 

1 48 kHz 

2 Reserved 

3 32 kHz 



0 Normal 

1 Variable speed 

2 High accuracy 

3 Reserved 



Figure 4.8 Bytes 0-3 (consumer format). 



Table 4.8 Electrical specifications of professional interfaces. 



Output impedance 


Signal amplitude 


Jitter 


110 Q 


2-7 V 


max. 20 ns 


Input impedance 


Signal amplitude 


Connect. 


110 O 


min. 200 mV 


XLR 



MADI format is derived from the two-channel AES/EBU format and allows the transmis- 
sion of 56 digital mono channels (see Fig. 4.10) within a sampling period. The MADI 
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frame consists of 56 AES/EBU subframes. Each channel has a preamble containing the 
information shown in Fig. 4.10. The bit 0 is responsible for identifying the first MADI 
channel (MADI Channel 0). Table 4.9 shows the sampling rates and the corresponding 
data transfer rates. The maximum data rate of 96.768 Mbit/s is required at sampling rate of 
48 kHz + 12.5%. Data transmission is done by FDDI techniques (Fiber Distributed Digital 
Interface). The transmission rate of 125 Mbit/s is implemented with special TAXI chips. 
The transmission for a coaxial cable is already specified (see Table 4.10). The optical 
transmission medium for audio applications is not yet defined. 




Figure 4.9 A system link by MADI. 



Table 4.9 MADI specifications. 



Sampling rate 


32 kHz-48 kHz ± 12.5% 


Transmission rate 
Data transfer rate 
Max. data transfer rate 
Min. data transfer rate 


125 Mbit/s 
100 Mbit/s 

96.768 Mbit/s (56 channels at 48 kHz + 12.5%) 
50.176 Mbit/s (56 channels at 32kHz - 12.5%) 



A unidirectional MADI link is shown in Fig. 4.11. The MADI transmitter and receiver 
must be synchronized by a common master clock. The transmission between FDDI chips 
is performed by a transmitter with integrated clock generation and clock retrieval at the 
receiver. 
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AES/EBU Format Subframe : 



0 1 2 3 4 2728 29 30 31 



1 1 1 












Preamble 1 

III 


Audio sample word 


V 


U 


C 


P 




MADI SYNC 
MAD I A/B 
MADI ACTIVE 
MADI CHANNEL 0 



MADI Frame Period : 





channel 0 


channel 1 


J L 


channel 54 


channel 55 





Figure 4.10 MADI frame format. 
Table 4.10 Electrical specifications (MADI). 



Output impedance 


Signal ampl. 


Cable length 


Connect. 


75 Q 


0.3-0. 7 V 


50 m (coaxial cable) 


BNC 




Figure 4.11 MADI link. 
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4.3 Single-processor Systems 

4.3.1 Peripherals 

A common system configuration is shown in Fig. 4.12. It consists of a DSP, clock gener- 
ation, instruction and data memory and a BOOT-EPROM. After RESET, the program is 
loaded into the internal RAM of the signal processor. The loading is done byte by byte 
so that only an EPROM with 8-bit data word-length is necessary. In terms of circuit com- 
plexity the connection of AD/DA converters over serial interfaces is the simplest solution. 
Most fixed-point signal processors support serial connection where a lead ‘connection’ 
for bit clock SCLK, sampling clock/word clock WCLK, and the serial input and output 
data SDRX/SDTX are used. The clock signals are obtained from a higher reference clock 
CLKIN (see Fig. 4.13). For non-serially operating AD/DA converters, parallel interfaces 
can also be connected to the DSP. 




Figure 4.12 DSP system with two-channel AD/DA converters (C = control, A = address, D = 
data, SDATA = serial data, SCLK = bit clock, WCLK = word clock, SDRX = serial input, SDTX = 
serial output). 



4.3.2 Control 

For controlling digital signal processors and data exchange with host processors, some 
DSPs provide a special host interface that can be read and written directly (see Fig. 4.14). 
The data word-length depends on the processor. The host interface is included in the 
external address space of the host or is connected to a local bus system, for instance a 
PC bus. 

A DSP as a coprocessor for special signal processing problems can be used by con- 
necting it with a dual-port RAM and additional interrupt logic to a host processor. This 
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Figure 4.13 Serial transmission format. 




Figure 4.14 Control via a host interface of the DSP (CS = chip select, R/W = read/write, A = 
address, D = data). 



enables data transmission between the DSP system and host processor (see Fig. 4.15). This 
results in a complete separation from the host processor. The communication can either be 
interrupt-controlled or carried out by polling a memory address in a dual-port RAM. 




Figure 4.15 Control over a dual-port RAM and interrupt. 



A very simple control can be done directly via an RS232-interface, This is can be car- 
ried out via an additional asynchronous serial interface (Serial Communication Interface) 
of the DSP (see Fig. 4.16). 
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RXD 

TXD 

SCLK 




Figure 4.16 Control over a serial interface (RS232, RS422). 




Figure 4.17 Cascading and pipelining (SDATA = serial data, SCLK = bit clock, SYNC = 
synchronization) . 




Figure 4.18 Parallel configuration with output time-multiplex. 




Figure 4.19 Time-multiplex connection (ADR = address at a particular time). 



4.4 Multi-processor Systems 

The design of multi-processor systems can be carried out by linking signal processors by 
serial or parallel interfaces. Besides purely multi-processor DSP systems, an additional 
connection to standard bus systems can be made as well. 
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PDATA (zH 






t) PDATA 



Figure 4.20 Cascading and pipelining. 



System Bus 




Figure 4.21 Parallel configuration. 



System Bus 




Figure 4.22 Connection over a four-port RAM. 




Figure 4.23 Signal processor systems based on standard bus system. 



4.4.1 Connection via Serial Links 

In connecting via serial links, signal processors are cascaded so that different program 
segments are distributed over different processors (see Fig. 4.17). The serial output data is 
fed into the serial input of the following signal processor. A synchronous bit clock and a 
common synchronization SYNC control the serial interface. With the help of a serial time- 
multiplex mode (Fig. 4.18) a parallel configuration can be designed which, for instance, 
feeds several parallel signal processors with serial input data. The serial outputs of signal 
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Figure 4.24 Audio system. 



processors provide output data in time-multiplex. A complete time-multiplex connection 
via the serial interface of the signal processor is shown in Fig. 4.19. The allocation of a 
signal processor at a particular time slot can either be fixed or carried out by an address 
control ADR. 

4.4.2 Connection via Parallel Links 

Connection via parallel links is possible with dual-port processors as well as with dual-port 
RAMs (see Fig. 4.20). A parallel configuration of signal processor systems with a local bus 
is shown in Fig. 4.21. The connection to the local bus is done either over a dual-port RAM 
or directly with a second signal processor port. Another possible configuration is the use 
of a four-port RAM as shown in Fig. 4.22. Here, one processor serves as a connector to a 
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Subsystem 1 




Figure 4.25 Scalable digital audio system. 




Figure 4.26 Subsystem. 



system bus and feeds three other processors over a four-port RAM with control and data 
information. 

4.4.3 Connection via Standard Bus Systems 

The use of standard bus systems (VME bus, MULTIBUS, PC bus) to control multi-processor 
systems is presented in Fig. 4.23. The connection of signal processors can either be carried 
out directly over a control bus or with the help of a special data bus. This parallel data bus 
can operate in time-multiplex. Hence, control information and data are separated. A few of 
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the criteria for standard bus systems are data transfer rate, interrupt request and processing, 
the option of several masters, auxiliary functions (power supply, bus error, battery buffer) 
and mechanical requirements. 

4.4.4 Scalable Audio System 

The functional segmentation of an audio system into different stages, the analog, interface, 
digital and man-machine stages, is shown in Fig. 4.24. All stages are controlled by a 
LAN (Local Area Network). In the analog domain, crosspoint switches and microphone 
amplifiers are controlled. In the interface domain AD/DA converters and sampling rate 
converters are used. The connection to a signal processing system is done by AES/EBU and 
MADI interfaces. A host computer with a control console for the sound engineer serves as 
the central control unit. 

The realization of the digital domain with the help of a standard bus system is shown 
in Fig. 4.25. A central mixing console controls several subsystems over a host. These sub- 
systems have special control computers which control several DSP modules. The system 
concept is scalable within a subsystem and by extension to several subsystems. Audio 
data transfer between subsystems is performed by AES/EBU and MADI interfaces. The 
segmentation within a subsystem is shown in Fig. 4.26. Here, besides DSP modules, digital 
interfaces (AES/EBU, MADI, sampling rate converters, etc.) and AD/DA converters can 
be integrated. 



References 

[AES91] AES10-1991 (ANSI S4.43-1991): AES Recommended Practice for Digital 
Audio Engineering - Serial Multichannel Audio Digital Interface (MADI). 

[AES92] AES3-1992 (ANSI S4.40-1992): AES Recommended Practice for Digital 
Audio Engineering - Serial Transmission Format for Two-Channel Linearly 
Represented Digital Audio. 




Chapter 5 

Equalizers 



Spectral sound equalization is one of the most important methods for processing audio 
signals. Equalizers are found in various forms in the transmission of audio signals from a 
sound studio to the listener. The more complex filter functions are used in sound studios. 
But in almost every consumer product like car radios, hih amplifiers, simple filter func- 
tions are used for sound equalization. We first discuss basic filter types followed by the 
design and implementation of recursive audio filters. In Sections 5.3 and 5.4 linear phase 
nonrecursive filter structures and their implementation are introduced. 

5.1 Basics 

For filtering of audio signals the following filter types are used: 

• Low-pass and high-pass filters with cutoff frequency f c (3 dB cutoff frequency) 
are shown with their magnitude response in Fig. 5.1. They have a pass-band in the 
lower and higher frequency range, respectively. 

• Band-pass and band-stop filters (magnitude responses in Fig. 5.1) have a center 
frequency f, and a lower and upper // und f, cutoff frequency. They have a pass- 
and stop-band in the middle of the frequency range. For the bandwidth of a band-pass 
or a band- stop filter we have 

fb = fu - ft- (5.1) 

Band-pass filters with a constant relative bandwidth fb/f c are very important for 
audio applications [Cre03]. The bandwidth is proportional to the center frequency, 
which is given by f c = V// 7 fu (see Fig. 5.2). 

• Octave filters are band-pass biters with special cutoff frequencies given by 

fu=2-fi, (5.2) 

fc = J7i 7 Tu=V2- fi. (5.3) 

A spectral decomposition of the audio frequency range with octave biters is shown 
in Fig. 5.3. At the lower and upper cutoff frequency an attenuation of —3 dB occurs. 
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Low/Ppass f c /fg=0.1 



HighPpass f c /f g =0.1 




Figure 5.1 Linear magnitude responses of low-pass, high-pass, band-pass, and band-stop filters. 



The upper octave band is represented as a high-pass. A parallel connection of octave 
filters can be used for a spectral analysis of the audio signal in octave frequency 
bands. This decomposition is used for the signal power distribution across the octave 
bands. For the center frequencies of octave bands we get f Ci — 2 • f Cj l . The weight- 
ing of octave bands with gain factors A, and summation of the weighted octave 
bands represents an octave equalizer for sound processing (see Fig. 5.4). For this 
application the lower and upper cutoff frequencies need an attenuation of —6 dB, 
such that a sinusoid at the crossover frequency has gain of 0 dB. The attenuation 
of —6 dB is achieved through a series connection of two octave filters with — 3 dB 
attenuation. 

• One-third octave filters are band-pass filters (see Fig. 5.3) with cutoff frequencies 
given by 



fu = V2-fl, (5.4) 

f c = V2-f,. (5.5) 



The attenuation at the lower and upper cutoff frequency is —3 dB. One-third octave 
filters split an octave into three frequency bands (see Fig. 5.3). 
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a) BandPpass filters with constant relative bandwidth versus logarithmic frequency 




b) BandPpass filters with constant relative bandwidth versus linear frequency 




Figure 5.2 Logarithmic magnitude responses of band-pass filters with constant relative bandwidth. 



Octave bancfPpass filters 





Figure 5.3 Linear magnitude responses of octave filters and decomposition of an octave band by 
three one-third octave filters. 



• Shelving filters and peak filters are special weighting filters, which are based on 
low-pass/high-pass/band-pass filters and a direct path (see Section 5.2.2). They have 
no stop-band compared to low-pass/high-pass/band-pass filters. They are used in 
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|H(f)|/dB 

+ 12 
+6 
0 
-6 
-12 



• f/Hz 



x (n) 




y(n) 



Figure 5.4 Parallel connection of band-pass filters (BP) for octave/one-third octave equalizers with 
gain factors (A; for octave or one-third octave band). 



a series connection of shelving and peak filters as shown in Fig. 5.5. The lower 
frequency range is equalized by low-pass shelving filters and the higher frequencies 
are modified by high-pass shelving filters. Both filter types allow the adjustment of 
cutoff frequency and gain factor. For the mid-frequency range a series connection 
of peak filters with variable center frequency, bandwidth, and gain factor are used. 
These shelving and peak filters can also be applied for octave and one-third octave 
equalizers in a series connection. 





fa 




f/Hz 



x(n) 




y(n) 



Cutoff frequency f c 
Gain G in dB 



Center frequency f c1 
Bandwidth f bi 
Gain G in dB 



Center frequency f ci 
Bandwidth f bi 
Gain G in dB 



Cutoff frequency f c 
Gain G in dB 



Figure 5.5 Series connection of shelving and peak filters (low-frequency LF, high-frequency HF). 



• Weighting filters are used for signal level and noise measurement applications. The 
signal from a device under test is first passed through the weighting filter and then a 
root mean square or peak value measurement is performed. The two most often used 
filters are the A-weighting filter and the CCIR-468 weighting filter (see Fig. 5.6). 
Both weighting filters take the increased sensitivity of the human perception in the 
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1-6 kHz frequency range into account. The 0 dB of the magnitude response of both 
filters is crossed at 1 kHz. The CCIR-468 weighting filter has a gain of 12 dB at 
6 kHz. A variant of the CCIR-468 filter is the 1TU-ARM 2 kHz weighting filter, 
which is a 5.6 dB down tilted version of the CCIR-468 filters and passes the 0 dB at 
2 kHz. 




f/Hz- 



Figure 5.6 Magnitude responses of weighting filters for root mean square and peak value 
measurements. 



5.2 Recursive Audio Filters 

5.2.1 Design 

A certain filter response can be approximated by two kinds of transfer function. On the one 
hand, the combination of poles and zeros leads to a very low-order transfer function FI (z) in 
fractional form, which solves the given approximation problem. The digital implementation 
of this transfer function needs recursive procedures owing to its poles. On the other hand, 
the approximation problem can be solved by placing only zeros in the z-plane. This transfer 
function H(z ) has, besides its zeros, a corresponding number of poles at the origin of 
the z-plane. The order of this transfer function, for the same approximation conditions, is 
substantially higher than for transfer functions consisting of poles and zeros. In view of 
an economical implementation of a filter algorithm in terms of computational complexity, 
recursive filters achieve shorter computing time owing to their lower order. For a sampling 
rate of 48 kHz, the algorithm has 20.83 fis processing time available. With the DSPs 
presently available it is easy to implement recursive digital filters for audio applications 
within this sampling period using only one DSP. To design the typical audio equalizers 
we will start with filter designs in the S -domain. These filters will then be mapped to the 
Z-domain by the bilinear transformation. 

Low-pass/High-pass Filters. In order to limit the audio spectrum, low-pass and high- 
pass filters with Butterworth response are used in analog mixers. They offer a monotonic 
pass-band and a monotonically decreasing stop-band attenuation per octave (n ■ 6 dB/oct.) 
that is determined by the filter order. Low-pass filters of the second and fourth order 
are commonly used. The normalized and denormalized second-order low-pass transfer 
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functions are given by 

(5.6) 

where u> c is the cutoff frequency and Q 0 0 is the pole quality factor. The Q - factor Q a 0 of a 
Butterworth approximation is equal to 1/V2. The denormalization of a transfer function is 
obtained by replacing the Laplace variable s by s /a> g in the normalized transfer function. 
The corresponding second-order high-pass transfer functions 

(5.7) 

are obtained by a low-pass to high-pass transformation. Figure 5.7 shows the pole-zero 
locations in the s-plane. The amplitude frequency response of a high-pass filter with a 3 dB 
cutoff frequency of 50 Hz and a low-pass filter with a 3 dB cutoff frequency of 5000 Hz 
are shown in Fig. 5.8. Second- and fourth-order filters are shown. 

a) jm b) jco 

* 






Figure 5.7 Pole-zero location for (a) second-order low-pass and (b) second-order high-pass. 
5 

o 

P3 



5f pio 

Pi 5 



P20 

20 200 f/Hz _ 2000 20000 

Figure 5.8 Frequency response of low-pass and high-pass filters - high-pass f c — 50 Hz 
(second/fourth order), low-pass f c = 5000 Hz (second/fourth order). 

Table 5.1 summarizes the transfer functions of low-pass and high-pass filters with 
Butterworth response. 
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Table 5.1 Transfer functions of low-pass and high-pass filters. 



Low-pass H(s) = 
H(s) = 



1 



S 2 + \/2.S + 1 



1 



(j 2 + 1.848s + 1)(V + 0.765s + 1) 

_2 



High-pass H(s) = ■ 



! 2 + VlS + 1 



H(s) = 



(s 2 + 1.848s + l)(s 2 + 0.765s + 1) 



second order 
fourth order 

second order 
fourth order 



Band-pass and band-stop filters. The normalized and denormalized band-pass transfer 
functions of second order are 



Hbp(s) — 



s- + 



-s + 1 



and Hbp(s) = 



7 - co r , 7 ’ 

s + + a c 



and the band-stop transfer functions are given by 



Hrs(s) = 



+ 7T S + 1 



and Hbs(s) = 



7 i cOp . 7 * 

S- + -FT~S + 0)t 



(5.8) 



(5.9) 



The relative bandwidth can be expressed by the 0-factor 

Go o = 4> ( 5 - 10 > 

Jb 

which is the ratio of center frequency f, and the 3 dB bandwidth given by f), . The magni- 
tude responses of band-pass filters with constant relative bandwidth are shown in Fig. 5.2. 
Such filters are also called constant-Q filters. The geometric symmetric behavior of the 
frequency response regarding the center frequency f c is clearly noticeable (symmetry re- 
garding the center frequency using a logarithmic frequency axis). 

Shelving Filters. Besides the purely band-limiting biters like low-pass and high-pass bi- 
ters, shelving biters are used to perform weighting of certain frequencies. A simple ap- 
proach for a brst-order low-pass shelving biter is given by 



H(s) = 1 + H lp (s) = 1 + 



Hp 

s + r 



(5.11) 



It consists of a brst-order low-pass biter with dc amplibcation of Hq connected in parallel 
with an all-pass system of transfer function equal to 1. Equation (5.1 1) can be written as 

s + (1 + Hq) s + Vo 



s + 1 



s + 1 



(5.12) 
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where Vo determines the amplification at co — 0. By changing the parameter Vo, any desired 
boost (Vo > 1) and cut (Vo < 1) level can be adjusted. Figure 5.9 shows the frequency 
responses for f c = 100 Hz. For Vo < 1, the cutoff frequency is dependent on Vo and is 
moved toward lower frequencies. 




Figure 5.9 Frequency response of transfer function (5.12) with varying Vq and cutoff frequency 
fc = 100 Hz. 



In order to obtain a symmetrical frequency response with respect to the zero decibel line 
without changing the cutoff frequency, it is necessary to invert the transfer function (5.12) 
in the case of cut (Vo < 1). This has the effect of swapping poles with zeros and leads to 
the transfer function 



H(s) = 



s + 1 
s+Vq 



(5.13) 



for the cut case. Figure 5.10 shows the corresponding frequency responses for varying Vq. 




Figure 5.10 Frequency responses of transfer function (5.13) with varying Vq and cutoff frequency 
fc = 100 Hz. 



Finally, Figure 5.1 1 shows the locations of poles and zeros for both the boost and the 
cut case. By moving zeros and poles on the negative a -axis, boost and cut can be adjusted. 
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Boost i t0 Cut 1“ 




Figure 5.11 Pole-zero locations of a first-order low-frequency shelving filter. 



The equivalent shelving filter for high frequencies can be obtained by 

(5.14) 

which is a parallel connection of a first-order high-pass with gain Hq and a system with 
transfer function equal to 1. In the boost case the transfer function can written with Vo = 
Hq -f- 1 as 

sV o + 1 

H(s) = — V 0 >1, (5.15) 

s + 1 

and for cut we get 

H(s)= * + ' Vo > 1. (5.16) 

sVo + 1 

The parameter Vo determines the value of the transfer function H(s) at a> = oc for high- 
frequency shelving filters. 

In order to increase the slope of the filter response in the transition band, a general 
second-order transfer function 




H(s) = 



ci2S 2 + ais + ao 
s 2 + s/2 s + 1 



(5.17) 



is considered, in which complex 


zeros are added to the complex poles. 


The calculation of 


poles leads to 








So o 1/2 = y^"(— 1 ± ./)■ 


(5.18) 


If the complex zeros 


fVo f , , 

So t/2 = y ~ (_1 ± 






(5.19) 



are moved on a straight line with the help of the parameter Vo (see Fig. 5.12), the transfer 



function 



H(s) = 



s 2 + \/2 Vo s + Vo 
s 2 + s/2s + 1 



(5.20) 



of a second-order low-frequency shelving filter is obtained. The parameter Vo determines 
the boost for low frequencies. The cut case can be achieved by inversion of (5.20). 
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Figure 5.12 Pole-zero locations of a second-order low-frequency shelving filter. 



A low-pass to high-pass transformation of (5.20) provides the transfer function 

Vos 2 + *J2Vqs + 1 



H(s) = 



' + s/7.S + 1 



of a second-order high-frequency shelving filter. The zeros 

i °l/2 = 1 /TTT(- 1± 2) 



2 Vo 



(5.21) 



(5.22) 



are moved on a straight line toward the origin with increasing Vo (see Fig. 5.13). The cut 
case is obtained by inverting the transfer function (5.21). Figure 5.14 shows the amplitude 
frequency response of a second-order low-frequency shelving filter with cutoff frequency 
100 Hz and a second-order high-frequency shelving filter with cutoff frequency 5000 Hz 
(parameter Vo). 



Boost 



J® 



X 

/ 

/ v o 



-1 



\ 

X 



Q 



Cut 



1® 



© 

/ 

; v o * 



\ // 

Q 



Figure 5.13 Pole-zero locations of second-order high-frequency shelving filter. 

Peak Filter. Another equalizer used for boosting or cutting any desired frequency is the 
peak filter. A peak filter can be obtained by a parallel connection of a direct path and a 
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Figure 5.14 Frequency responses of second-order low-/high-frequency shelving filters - low- 
frequency shelving filter f c = 100 Hz (parameter Vp), high-frequency shelving filter f c = 5000 Hz 
(parameter Vq). 



band-pass according to 

H(s) = 1 + Hbp(s). 

With the help of a second-order band-pass transfer function 



Hbp(s) — 



(Hq/ Qoo)s 
s 2 + 7 J—S + 1 

oo 



the transfer function 



H(s) — 1 + Hbp(s) = 



s 2 + ^ s + 1 
S 2 + 77 -S + 1 

S^oo 



S 2 + 7 y-S + 1 

ic oo 

S 2 + 77 -S + 1 



(5.23) 



(5.24) 



(5.25) 



of a peak filter can be derived. It can be shown that the maximum of the amplitude fre- 
quency response at the center frequency is determined by the parameter Vo- The relative 
bandwidth is fixed by the Q- factor. The geometrical symmetry of the frequency response 
relative to the center frequency remains constant for the transfer function of a peak fil- 
ter (5.25). The poles and zeros lie on the unit circle. By adjusting the parameter Vo, the 
complex zeros are moved with respect to the complex poles. Figure 5.15 shows this for the 
boost and cut cases. With increasing 0-factor, the complex poles move toward the yen-axis 
on the unit circle. 

Figure 5.16 shows the amplitude frequency response of a peak filter by changing the 
parameter Vo at a center frequency of 500 Hz and a 0-factor of 1.25. Figure 5.17 shows 
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Figure 5.15 Pole-zero locations of a second-order peak filter. 



the variation of the (1-factor at a center frequency of 500 Hz, a boost/cut of ±16 dB 
and (1-factor of 1.25. Finally, the variation of the center frequency with boost and cut of 
±16 dB and a (1-factor 1.25 is shown in Fig. 5.18. 




Figure 5.16 Frequency response of a peak filter - f c — 500 Hz, Q 0 0 = 1.25, cut parameter Vq. 



Mapping to Z-domain. In order to implement a digital filter, the filter designed in the 
^-domain with transfer function H(s) is converted to the Z-domain with the help of a 
suitable transformation to obtain the transfer function H(z). The impulse-invariant trans- 
formation is not suitable as it leads to overlapping effects if the transfer function H (s ) is not 
band-limited to half the sampling rate. An independent mapping of poles and zeros from 
the .S’-domain into poles and zeros in the Z-domain is possible with help of the bilinear 
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transformation given by 



s — 



2 z- 1 

Tz+ r 



Tables 5. 2-5. 5 contain the coefficients of the second-order transfer function 



flo + fllZ 1 + C 12 Z 2 
“ l + feiz^+^z - 2 ’ 



(5.26) 



(5.27) 



which are determined by the bilinear transformation and the auxiliary variable K — 
tan(a> c T /2) for all audio filter types discussed. Further filter designs of peak and shelving 
filters are discussed in [Moo83, Whi86, Sha92, Bri94, Orf96a, Dat97, ClaOO]. A method 
for reducing the warping effect of the bilinear transform is proposed in [Orf96b]. Strategies 
for time-variant switching of audio filters can be found in [Rab88, Mou90, Z6193, Din95, 
Val98], 



Table 5.2 Low-pass/high-pass/band-pass filter design. 



Low-pass (second order) 


a 0 


Cl\ 


a 2 £q 


b 2 


K 2 


2 K 2 


K 2 2{K 2 - 1) 


1 - -Jl K + K 2 


1 + V2 K + K 2 


1 + V2K + K 2 1 + V2K + K 2 1 + V2K + K 2 


1 + ~J2K + K 2 






High-pass (second order) 




a 0 


Cl\ 


a 2 b\ 


b 2 


1 


-2 


1 2(K 2 -\) 


1 - y/2K + K 2 


1 + V2K + K 2 


1 + V2 K + K 2 1 + \f2K + K 2 1 + \f2K + K 2 


1 + V2K + K 2 






Band-pass (second order) 




a 0 


Cl\ 


a 2 b\ 


b 2 


V K 


0 


-q k 2{K 2 — 1) 


1-jjK + K 2 


\ + ±K + K 2 


l + Jy/f + tf 2 1 + ±K + K 2 


1 + jjK + K 2 



5.2.2 Parametric Filter Structures 

Parametric filter structures allow direct access to the parameters of the transfer function, 
like center/cutoff frequency, bandwidth and gain, via control of associated coefficients. To 
modify one of these parameters, it is therefore not necessary to compute a complete set 
of coefficients for a second-order transfer function, but instead only one coefficient in the 
filter structure is calculated. 

Independent control of gain, cutoff/center frequency and bandwidth for shelving and 
peak filters is achieved by a feed forward (FF) structure for boost and a feed backward 
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Table 5.3 Peak filter design. 



Peak (boost V 0 = 10 G / 20 ) 

#o a \ a 2 

1 + -& K + K2 2(K 2 - 1 ) l--^K + K 2 2( K - - 1 ) 

1 + J-K + K 2 l+'-K + K 2 l+'-K + K 2 l+'-K + K 2 

oo \s oo oo oo 



b 2 

1 - rr— K + A" 2 

1 + J-K + tf 2 

t^oo 



Peak (cut V 0 = 10 _G/2 °) 

«0 a l a 2 b\ 

l + TL K + K 2 2(K 2 — 1) + 2(/f 2 -l) 

l + -^-K + K 2 l + ^-K + K 2 l + ^-K + K 2 l + ^-K + K 2 

OO OO t'OO tiOO 



b 2 

1 - + K 2 

k'oo 

1 + J>-K + A' 2 

tiOO 



Table 5.4 Low-frequency shelving filter design. 



Low-frequency shelving (boost Vq = 10 G / 20 ) 


a 0 

l + y/2 V^K + VqK 2 


a \ 

2 (V 0 K 2 - 1) 


a 2 b \ 

1 - ysvp: + V 0 K 2 2 (K 2 - 1) 


b 2 

1 - V2K + K 2 


1 + s/2K + K 2 


1 + V2K + K 2 


1 + -/2K + K 2 1 + V2K + K 2 


1 + Si K + K 2 




Low-frequency shelving (cut Vq = 10 G / 20 ) 




a 0 

1 + y/2 K + K 2 


ai 

2(K 2 - 1) 


a 2 b\ 

l-y/2K+K 2 2(V 0 AT 2 -1) 


b 2 

1 - y/2%K + VqK 2 


\ + j2 %K+V 0 K 2 


1 + JvTqK + VqK 2 


i + yirp: + v 0 k 2 i + y2vp: + v 0 k 2 


l + j2mK+V 0 K 2 



Table 5.5 High-frequency shelving filter design. 



High-frequency shelving (boost Vq = 10 G / 20 ) 


a 0 


a\ 


a 2 


b\ 


b 2 


v 0 + V^o K + k1 


2 (K 2 - V 0 ) 


Vo - V^o K + k1 


2 (K 2 - 1) 


1 - V2K + K 2 


1 + -JlK + K 2 


1 + y/2 K + K 2 


1 + Si K + K 2 


l + V2k + k 2 


1 + Si K + K 2 




High-frequency shelving (cut Vq = 10 G / 20 ) 




a 0 


a l 


a 2 


b \ 


b 2 


1 + yfl K + K 2 


1 

<N 

<N 


1 - V2 K + K 2 


2(K 2 /V 0 - 1) 


1-^2/ VqK + K 2 /Vq 


v 0 + /2%K + K 2 


Vo + y/2 VqK + K 2 


Vo + + K 2 


l+j2/%K + K 2 /V 0 


1 + y/2jVoK + K 2 /V 0 



(FB) structure for cut as shown in Fig. 5.19. The corresponding transfer functions are: 



Gfw(z ) = 1 + HqFI(z), 

GraW =i Tthm' 



(5.28) 

(5.29) 
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x(n) »-•{+} 




{+)— ►°y(n) 



Cut Boost 

Figure 5.19 Filter structure for implementing boost and cut filters. 



The boost/cut factor is Vo = 1 + Hq. For digital filter implementations, it is necessary 
for the FB case that the inner transfer function be of the form H(z ) = z~ { H\ (z) to ensure 
causality. A parametric filter structure proposed by Harris [Har93] is based on the FF/FB 
technique, but the frequency response shows slight deviations near z — 1 and z — — 1 from 
that desired. This is due to the z 1 in the FF/FB branch. Delay-free loops inside filter 
computations can be solved by the methods presented in [Har98, FonOl, Fon03], Higher- 
order parametric filter designs have been introduced in [Kei04, Orf05, Hol06a, Hol06b, 
Hol06c, Hol06d], It is possible to implement typical audio filters with only an FF structure. 
The complete decoupling of the control parameters is possible for the boost case, but there 
remains a coupling between bandwidth and gain factor for the cut case. In the following, 
two approaches for parametric audio filter structures based on an all-pass decomposition of 
the transfer function will be discussed. 



Regalia Filter [Reg87]. The denormalized transfer function of a first-order shelving filter 
is given by 



H(s) = 



s + Vo co c 
s + a> c 



(5.30) 



with 



H{ 0) = Vo, 
H(oo ) = 1. 



A decomposition of (5.30) leads to 

s co r 

H(s) = + V 0 — . (5.31) 

s + (o c s + (o c 

The low-pass and high-pass transfer functions in (5.31) can be expressed by an all-pass 
decomposition of the form 



5 

S + (O c 

Vo (O c 

S + (O c 



1 

2 



1 + 



S - (Oc 
S + (Oc J ’ 



Vo " _ 5 - (O c 

2 |_ s + (o c 



With the all-pass transfer function 



(5.32) 

(5.33) 



s - (O c 
s + (O c 



A b (s) = 



(5.34) 
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for boost, (5.30) can be rewritten as 

H(s) = \[1 + A b (s)] + \V 0 [\ - A B (,y)]. 
The bilinear transformation 

2 z- 1 



(5.35) 



leads to 



with 



T z+ 1 

H(z) = £[1 + A fl (z)] + 7 Vo[l - A fl (z)] 



As(z) = — 



Z 1 + OB 



and the frequency parameter 






1 + asz 1 
tan (co c T /2) — 1 



(5.36) 

(5.37) 

(5.38) 



tan(« c r /2) + 1 

A filter structure for direct implementation of (5.36) is presented in Fig. 5.20a. Other 
possible structures can be seen in Fig. 5.20b,c. For the cut case Vo < 1, the cutoff frequency 
of the filter moves toward lower frequencies [Reg87], 




>y(n) 



y(n) 



y(n) 



Figure 5.20 Filter structures by Regalia. 

In order to retain the cutoff frequency for the cut case [Zol95] , the denormalized transfer 
function of a first-order shelving filter (cut) 



H(s) = 



s + (o c 

S + (Oc/V o’ 



(5.39) 
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with the boundary conditions 



H( 0) = Vo, 

H(o o) = 1, 



can be decomposed as 

H(s) = 

With the all-pass decompositions 

5 



+ 



s + co c /V o s + coc/V o 

S ~ COc/Vo 



1 



s + a> c / Vo 2 
a> c _ Vo 

~~ T 



s + (Oj Vo 

and the all-pass transfer function 



1 + 

1 - 



s + a>c/Vo 
s - cod Vo 



A C (s) = 



s + C0c/V , 0 

5 - CO c / Vo 
S + cod Vo 



for cut, (5.39) can be rewritten as 



(5.40) 



(5.41) 

(5.42) 



(5.43) 



H(s) = 1[1 + AcC*)] + y [1 - A c b)]. (5.44) 

The bilinear transformation leads to 

H(Z) = i[l + Ac(z)] + y [1 - Ac(z)] (5.45) 



with 



Acfe) = - 



Z 1 + Clc 
1 + acz~ x 



(5.46) 



and the frequency parameter 



tan(w c r/2) - Vp 
tan (co c T /2) + Vo 



(5.47) 



Due to (5.45) and (5.36), boost and cut can be implemented with the same filter structure 
(see Fig. 5.20). Flowever, it has to be noted that the frequency parameter ac as in (5.47) for 
cut depends on the cutoff frequency and gain. 

A second-order peak filter is obtained by a low-pass to band-pass transformation ac- 
cording to 



_-i 



_-i 



.-1 



1 + dz 



-l ' 



(5.48) 



For an all-pass as given in (5.37) and (5.46), the second-order all-pass is given by 



Abc(z ) = 



Z 2 +d(l+a B c)z 
1 + <7(1 + a B c)z~ x + a B cZ~ 2 



(5.49) 
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with parameters (cut as in [Z6195]) 

d 
Vo 

OB 
ac 

The center frequency f c is fixed by the parameter cl, the bandwidth //, by the parameters 
cib and ac, and gain by the parameter Vo- 

Simplified All-pass Decomposition [Zol95]. The transfer function of a first-order low- 
frequency shelving filter can be decomposed as 



= — COS^f), 


(5.50) 


II 

5 

re 


(5.51) 


1 — tan (a>bT /2) 


(5.52) 


1 + tan (cobT /2) ’ 


V 0 - tan (city 772) 


(5.53) 


Vo + tan(cu^7’ /2) ’ 



with 



H{s) = 



s + Vo toe 

S + U>c 



= 1 + H 0 



a>c 

s + a> c 




s - co c 
s + co c 



(5.54) 

(5.55) 



0 

II 

II 

£ 


(5.56) 


1 

£ 

II 

£ 


(5.57) 


Vo = 10 G/2 ° (G in dB). 


(5.58) 



The transfer function (5.55) is composed of a direct branch and a low-pass filter. The 
first-order low-pass filter is again implemented by an all-pass decomposition. Applying 
the bilinear transformation to (5.55) leads to 



H(z)=l + ^[1- A(z)i 



with 



.-1 



A(z) = — - 



■ as 



1 + flfiZ 1 

For cut, the following decomposition can be derived: 

5 + 0) c 



H(s) = 



s + a>d Vo 



&>c/ V o 




(5.59) 



(5.60) 



(5.61) 

(5.62) 



(5.63) 
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The bilinear transformation applied to (5.63) again gives (5.59). The filter structure is 
identical for boost and cut. The frequency parameter a B for boost and ac for cut can be 
calculated as 



tan (to c T /2) — 1 
tan (co c T /2) + 1 ' 
tan(w c r/2) - Vp 
tan (co c T /2) + Vo 



(5.64) 

(5.65) 



The transfer function of a first-order low-frequency shelving filter can be calculated as 



H(z) = 



1 + (1 + aBc)~Y ( a BC + (1 + eiBc)^r)z 1 
1 + a B cZ ~ 1 



(5.66) 



With A\(z) = —A(z) the signal flow chart in Fig. 5.21 shows a first-order low-pass 
filter and a first-order low-frequency shelving filter. 



first-order LF shelving filter 



x(n) o 




first-order low-pass filter 




Figure 5.21 Low-frequency shelving filter and first-order low-pass filter. 



The decomposition of a denormalized transfer function of a first-order high-frequency 
shelving filter can be given in the form 

sVo + co c 



H(s) = 



S + Cti c 



= 1 + Ho 

- + ? 



S + CO c 



1 + 



s - 00c 
s + a> c 



where 



Vo = H(s — oo), 
H 0 = V 0 - 1. 



(5.67) 

(5.68) 



(5.69) 

(5.70) 



The transfer function results by adding a high-pass filter to a constant. Applying the bilinear 
transformation to (5.68) gives 
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with 



A(z) = - 



Z 1 + ClB 
1 + OBZ~ l 



For cut, the decomposition can be given by 



H(s) = 



s + a> c 




(5.72) 



(5.73) 

(5.74) 

(5.75) 



which in turn results in (5.71) after a hi lineal' transformation. The boost and cut parameters 
can be calculated as 



tan (co c T /2) — 1 
tan (co c T /2) + 1 ’ 
y 0 tan(m c r/2) - 1 
Votan (a> c T /2) + 1 



(5.76) 

(5.77) 



The transfer function of a first-order high-frequency shelving filter can then be written as 



H(z) = 



1 + (1 — ClBc)^r + ( CISC + (ciBC — 1 )^y)z 1 
1 + OBCZ~ l 



(5.78) 



With Ai(z) = —A(z) the signal flow chart in Fig. 5.22 shows a first-order high-pass filter 
and a high-frequency shelving filter. 



first-order HF shelving filter 



x(n) o 




first-order high-pass filter 



x(n) o- 



1/2 



A,(z) hT)-^X)— *-° y(n) 



Figure 5.22 First-order high-frequency shelving and high-pass filters. 

The implementation of a second-order peak filter can be carried out with a low-pass to 
band-pass transformation of a first-order shelving filter. But the addition of a second-order 
band-pass filter to a constant branch also results in a peak filter. With the help of an all-pass 
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implementation of a band-pass filter as given by 



H(z) = i[l - A 2 (z)] 



A 2 (z) = 



— ag + (d — dae)z l +Z 2 



1 + (d — daB)z 1 — agz 2 ’ 
a second-order peak filter can be expressed as 



H(z)=l + -f[l-A 2 (z)l 

The bandwidth parameters as and ac for boost and cut are given 

tan(<u^r/2) — 1 
B tan {wb T / 2) + 1 ’ 

_ tan(fti*r/2) - Vo 
aC ~ tm((o b T/2)+ Vo' 

The center frequency parameter d and the coefficient //q are given by 

d = — cos(f2 c ), 

Vo = H(e iSic ) f 
h q = Vq-i. 

The transfer function of a second-order peak filter results in 



H{z) = 



1 + (1 + eiBc)-^r + d{ 1 — obc)z 1 + (—obc — (1 + a Bc)^r)z 2 
1 + c/(l - a B c)z~ l - a B cZ~ 2 



The signal flow charts for a second-order peak filter and a second-order band-pass filter are 
shown in Fig. 5.23. 

second-order peak filter 
H 0 /2 

x(n) o <■ ► A 2 (z) y(n) 



second-order band-pass filter 



a 2 (z) — y(n) 



Figure 5.23 Second-order peak filter and band-pass filter. 



The frequency responses for high-frequency shelving, low-frequency shelving and peak 
filters are shown in Figs 5.24, 5.25 and 5.26. 
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Figure 5.26 Second-order peak filter (G = ±18dB, f c = 50, 100, 1000,3000, 10000 Hz, f b = 
100 Hz). 



5.2.3 Quantization Effects 

The limited word-length for digital recursive filters leads to two different types of quanti- 
zation error. The quantization of the coefficients of a digital filter results in linear distortion 
which can be observed as a deviation from the ideal frequency response. The quantization 
of the signal inside a filter structure is responsible for the maximum dynamic range and 
determines the noise behavior of the filter. Owing to rounding operations in a filter struc- 
ture, roundoff noise is produced. Another effect of the signal quantization is limit cycles. 
These can be classified as overflow limit cycles, small-scale limit cycles and limit cycles 
correlated with the input signal. Limit cycles are very disturbing owing to their small-band 
(sinusoidal) nature. The overflow limit cycles can be avoided by suitable scaling of the 
input signal. The effects of other errors mentioned above can be reduced by increasing the 
word-lengths of the coefficient and the state variables of the filter structure. 

The noise behavior and coefficient sensitivity of a filter structure depend on the topol- 
ogy and the cutoff frequency (position of the poles in the Z-domain) of the filter. Since 
common audio filters operate between 20 Hz and 20 kHz at a sampling rate of 48 kHz, 
the filter structures are subjected to specially strict criteria with respect to error behavior. 
The frequency range for equalizers is between 20 Hz and 4-6 kHz because the human 
voice and many musical instruments have their formants in that frequency region. For 
given coefficient and signal word-lengths (as in a digital signal processor), a filter structure 
with low roundoff noise for audio application can lead to a suitable solution. For this, the 
following second-order filter structures are compared. 
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The basis of the following considerations is the relationship between the coefficient 
sensitivity and roundoff noise. This was first stated by Fettweis [Fet72]. By increasing the 
pole density in a certain region of the z-plane, the coefficient sensitivity and the roundoff 
noise of the filter structure are reduced. Owing to these improvements, the coefficient 
word-length as well as signal word-length can be reduced. Work in designing digital filters 
with minimum word-length for coefficients and state variables was first carried out by 
Avenhaus [Ave71]. 

Typical audio filters like high-/low-pass, peak/shelving filters can be described by the 
second-order transfer function 



flo + a\z 1 + ci 2 Z 2 
(Z) ~ l + b lZ ~ l + b 2 z~ 2 ' 



(5.88) 



The recursive part of the difference equation which can be derived from the transfer func- 
tion (5.88) is considered more closely, since it plays a major role in affecting the error 
behavior. Owing to the quantization of the coefficients in the denominator in (5.88), the 
distribution of poles in the z-plane is restricted (see Fig. 5.27 for 6-bit quantization of 
coefficients). The pole distribution in the second quadrant of the z-plane is the mirror image 
of the first quadrant. Figure 5.28 shows a block diagram of the recursive part. Another 
equivalent representation of the denominator is given by 



H(z) = 



N(z) 

1 — 2 r cos tpz~ l + r 2 z~ 2 



(5.89) 



Here r is the radius and <p the corresponding phase of the complex poles. By quantizing 
these parameters, the pole distribution is altered, in contrast to the case where b\ and b 2 are 
quantized as in (5.88). 



z-plane 




Figure 5.27 Direct-form structure - pole distribution (6-bit quantization). 
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Figure 5.28 Direct-form structure - block diagram of recursive part. 



The state variable structure [Mul76, Bom85] is based on the approach by Gold and 
Rader [Gol67], which is given by 



H(z) = 



1 - 2Re{zoo}z 1 



N(z ) 

(Re{zoo } 2 + Im{zoo} 2 )z -2 ’ 



(5.90) 



The possible pole locations are shown in Fig. 5.29 for 6-bit quantization (a block diagram 
of the recursive part is shown in Fig. 5.30). Owing to the quantization of real and imaginary 
parts, a uniform grid of different pole locations results. In contrast to direct quantization of 
the coefficients b\ and hi in the denominator, the quantization of the real and imaginary 
parts leads to an increase in the pole density at z = 1 . The possible pole locations in the 
second quadrant in the z-plane are the mirror images of the ones in the first quadrant. 



z-plane 




Figure 5.29 Gold and Rader - pole distribution (6-bit quantization). 

In [Kin72] a filter structure is suggested which has a pole distribution as shown in 
Fig. 5.31 (for a block diagram of the recursive part, see Fig. 5.32). 
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z-plane 




Figure 5.31 Kingsbury - pole distribution (6-bit quantization). 




The corresponding transfer function, 



N(z) 



H(z) = 



1 - (2 - k\k ,2 - k\)z 1 + (1 - k\k, 2 )z 2 ’ 



(5.91) 
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shows that in this case the coefficients b\ and bi can be obtained by a linear combination 
of the quantized coefficients k\ and kj- The distance d of the pole from the point z— 1 
determines the coefficients 

(5.92) 

(5.93) 

lm{z} 

Re{z} 

Figure 5.33 Geometric interpretation. 




as illustrated in Fig. 5.33. 




The filter structures under consideration showed that by a suitable linear combination 
of quantized coefficients, any desired pole distribution can be obtained. An increase of the 
pole density at z = 1 can be achieved by influencing the linear relationship between the 
coefficient k\ and the distance d from z = 1 [Z6189, Zol90]. The nonlinear relationship of 
the new coefficients gives the following structure with the transfer function 



H(z) = 



NW 

1 - (2 - Z1Z2 - Zj)z _I + (1 - ZlZ2)z -2 



(5.94) 



and coefficients 



with 



zt = ^l 



Z 2 = 



I ~b 2 

zi 



(5.95) 

(5.96) 



zi = sTdP-. 



(5.97) 



The pole distribution of this structure is shown in Fig. 5.34. The block diagram of the 
recursive part is illustrated in Fig. 5.35. The increase in the pole density at z = 1, in contrast 
to previous pole distributions is observed. The pole distributions of the Kingsbury and 
Zolzer structures show a decrease in the pole density for higher frequencies. For the pole 
density, a symmetry with respect to the imaginary axis as in the case of the direct-form 
structure and the Gold and Rader structure is not possible. But changing the sign in the 
recursive part of the difference equation results in a mirror image of the pole density. The 
mirror image can be achieved through a change of sign in the denominator polynomial. The 
denominator polynomial 



| 

D(z) = 1 ±^(2 - Z1Z2 - Zj)z -1 + (1 - ZlZ2)z -2 
shows that the real part depends on the coefficient of z _1 ■ 



(5.98) 
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z-plane 




Figure 5.34 Zolzer - pole distribution (6-bit quantization). 




Analytical Comparison of Noise Behavior of Different Filter Structures 

In this section, recursive filter structures are analyzed in terms of their noise behavior in 
fixed-point arithmetic [Z6189, Z6190, Z6194], The block diagrams provide the basis for an 
analytical calculation of noise power owing to the quantization of state variables. First of 
all, the general case is considered in which quantization is performed after multiplication. 
For this purpose, the transfer function G, (z) of every multiplier output to the output of the 
filter structure is determined. 

For this error analysis it is assumed that the signal within the filter structure covers the 
whole dynamic range so that the quantization error e, (n) is not correlated with the signal. 
Consecutive quantization error samples are not correlated with each other so that a uniform 
power density spectrum results [Sri77], It can also be assumed that different quantization 
errors e; (n) are uncorrelated within the filter structure. Owing to the uniform distribution 
of the quantization error, the variance can be given by 




(5.99) 
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The quantization error is added at every point of quantization and is filtered by the cor- 
responding transfer function G(z ) to the output of the filter. The variance of the output 
quantization noise (due to the noise source e(n)) is given by 



2 _ 2 
a ye ~ a E 



1 

2zfj Tz=e 



G(z)G(z~ 1 )z~' dz. 



(5.100) 



Exact solutions for the ring integral (5.100) can be found in [Jur64] for transfer functions 
up to the fourth order. With the L 2 norm of a periodic function 



| G || 2 = 



L J | G(.e ja )\ 2 dQ 



(5.101) 



the superposition of the noise variances leads to the total output noise variance 



ye 



=^IEh g 



i M2- 



(5.102) 



The signal-to-noise ratio for a full-range sinusoid can be written as 



0.5 

SNR = 10 log 10 — dB. 



ye 



The ring integral 



In = 



1 



A(z)A(z~ l ) _! 



2jtj J z=e ja B(z)B(z ') 
is given in [Jur64] for first-order systems by 

aoz + a i 



Z dz 



G(z) = 

h = 

and for second-order systems by 



boz + b\ ’ 

(< 7 q + a\)bo — 2aoa\bi 



bo(bl - b\) 



(5.103) 



(5.104) 



(5.105) 

(5.106) 



aoz +aiz + a 2 
boz 2 + biz + i> 2 ' 


(5.107) 


Aoboc\ - A\bob\ + A 2 (b\ - b 2 c\) 
boKb 2 - bf)ci - (bobi - b\b 2 )b\] ’ 


(5.108) 


^0 = fl 0 a \ fl 2’ 


(5.109) 


A\ = 2(aoai + aia 2 ). 


(5.110) 


A 2 — 2aoa 2 , 


(5.111) 


ci = b 0 + b 2 . 


(5.112) 



In the following, an analysis of the noise behavior for different recursive filter structures 
is presented. The noise transfer functions of individual recursive parts are responsible for 
noise shaping. 
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e 1 (n) + e 2 (n) 




Figure 5.36 Direct form with additive error signal. 



Table 5.6 Direct form - (a) noise transfer function, (b) quadratic L 2 norm and (c) output noise 
variance in the case of quantization after every multiplication. 



(a) 

(b) 

(c) 



Gi(z) = G 2 (z) = 



Z 2 + b\z + b 2 



9 9 1 T bo 1 

l|Gi||| = ||G 2 ||2 = -±- — — — 

" 1 - *2 (14- b 2 ) 2 - b 



9 9 _ 1 + b 2 

ffye = 0-^2 



1 - b 2 (1 + b 2 ) 2 - b\ 



2 

1 



The error transfer function of a second-order direct-form structure (see Fig. 5.36) has 
only complex poles (see Table 5.6). 

The implementation of poles near the unit circle leads to high amplification of the 
quantization error. The effect of the pole radius on the noise variance can be observed in 
the equation for output noise variance. The coefficient b 2 — r 2 approaches 1, which leads 
to a huge increase in the output noise variance. 

The Gold and Rader filter structure (Fig. 5.37) has an output noise variance that depends 
on the pole radius (see Table 5.7) and is independent of the pole phase. The latter fact is 
because of the uniform grid of the pole distribution. An additional zero on the real axis 
(z = r cos <p) directly beneath the poles reduces the effect of the complex poles. 

The Kingsbury filter (Fig. 5.38 and Table 5.8) and the Zolzer filter (Fig. 5.39 and 
Table 5.9), which is derived from it, show that the noise variance depends on the pole 
radius. The noise transfer functions have a zero at z = 1 in addition to the complex poles. 
This zero reduces the amplifying effect of the pole near the unit circle at z— 1 . 

Figure 5.40 shows the signal-to-noise ratio versus the cutoff frequency for the four filter 
structures presented above. The signals are quantized to 16 bits. Here, the poles move with 
increasing cutoff frequency on the curve characterized by the 0-factor Qoo = 0.707 1 in the 
2 -plane. For very small cutoff frequencies, the Zolzer filter shows an improvement of 3 dB 
in terms of signal-to-noise ratio compared with the Kingsbury filter and an improvement of 
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e 1 (n) + e 2 (n) e 3 (n) + e 4 (n) 




Figure 5.37 Gold and Rader structure with additive error signals. 



Table 5.7 Gold and Rader - (a) noise transfer function, (b) quadratic L 2 norm and (c) output noise 
variance in the case of quantization after every multiplication. 



(a) 



G i(z) = G 2 (z) = 



G 3 (z) = G 4 (z) = 



r sin <p 

z~ — 2 r cos <pz + >'~ 
z — r cos i p 
z 2 — 2 r cos i pz + r 2 



(b) 



I|Gi||| = ||G 2 ||| 
IIG 3 [|| = ||G 4 ll| 



1 + Z ?2 (r sin ^>) 2 
1 - *2 (1 + /> 2 ) 2 - 

1 [1 + (r sin<p) 2 ](l + b 2 ) 2 - b\ 

1-^2 (1 + £> 2 ) 2 — 



(c) 



ye 



= 4 2 



1 

l~b 2 



e^n) 



e (n) e (n) 



x(n) 




Figure 5.38 Kingsbury structure with additive error signals. 



6 dB compared with the Gold and Rader filter. Up to 5 kHz, the Zolzer filter yields better 
results (see Fig. 5.41 ). From 6 kHz onwards, the reduction of pole density in this filter leads 
to a decrease in the signal-to-noise ratio (see Fig. 5.41). 
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Table 5.8 Kingsbury - (a) noise transfer function, (b) quadratic L 2 norm and (c) output noise variance 
in the case of quantization after every multiplication. 



(a) 



Gi(z)=- 



G 2 {z) = 



G 3 (z)=- 



—k\z 

(2 - k\k 2 - k\)z + (1 - k\k 2 ) 

—k\{z - 1) 

(2 - k\k 2 - k\)z + (1 - k\k 2 ) 
z - 1 

(2 - k\k 2 - kj)z + (1 - k\k 2 ) 



(b) 



(c) 



l|Gi||| = 

\\g 2 \\1 = 
IIG 3 h| = 



2 2 
/T z - 



^ ye 



= oil 



1 


2 — ^ 1^2 


k\k 2 2(2 


- k\k 2 ) - k\ 


ki 


2 


k 2 2(2 - 


k\k 2 ) - k\ 


1 


2 


k\k 2 2(2 


- k\k 2 ) - k\ 


5 -f- 2 b\ + 3Z?2 



(1 - *2)d + *2 — *l) 



Table 5.9 Zolzer - (a) noise transfer function, (b) quadratic L 2 norm and (c) output noise variance in 
the case of quantization after every multiplication. 



(a) 



Gi(z) 

G 2 (z) = G 3 (z) 
G 4 (z) 



Z 2 - (2 - Z!Z 2 - Zj)z + (1 - Z 1 Z 2 ) 

— Zl (z ~ 1) 

Z 2 - (2 - Z!Z2 - Zj)z + (1 - Z1Z2) 
Z - 1 

Z 2 - (2 - Z1Z2 - Zjlz + (1 - Z1Z2) 



(b) 



IIGi || 2 

l|G 2 |ll = l|G3||| 

IIG 4 III 



A 2 — Z\Z 2 

Z\Z 2 2 zj (2 - Z\Z 2 ) - Zj 



z\z 2 2z 2 (2 - ziz 2 ) - zf 



z\z 2 2zj (2 - z\z 2 ) - Zj 



2 2_6 + 4 (*J + *2) + ( 1 + *2)(1 + *1 +/> 2 ) 1/3 

a ye = a E 2 



(c) 



(1 - * 2)(1 + *2 — *1) 
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Figure 5.40 SNR vs. cutoff frequency - quantization of products (f c < 200 Hz). 




2k 4k 6k 8k 10k 12k 



Figure 5.41 SNR vs. cutoff frequency - quantization of products {f c > 2 kHz). 
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With regard to the implementation of the these filters with digital signal processors, a 
quantization after every multiplication is not necessary. Quantization takes place when the 
accumulator has to be stored in memory. This can be seen in Figs 5.42-5.45 by introducing 
quantizers where they really occur. The resulting output noise variances are also shown. 
The signal-to-noise ratio is plotted versus the cutoff frequency in Figs 5.46 and 5.47. In 
the case of direct-form and Gold and Rader filters, the signal-to-noise ratio increases by 
3 dB whereas the output noise variance for the Kingsbury filter remains unchanged. The 
Kingsbury filter and the Gold and Rader filters exhibit similar results up to a frequency of 
200 kHz (see Fig. 5.46). The Zolzer filter demonstrates an improvement of 3 dB compared 
with these structures. For frequencies of up to 2 kHz (see Fig. 5.47) it is seen that the 
increased pole density leads to an improvement of the signal-to-noise ratio as well as a 
reduced effect due to coefficient quantization. 




Figure 5.42 Direct-form filter - quantization after accumulator. 




Figure 5.43 Gold and Rader filter - quantization after accumulator. 
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2 2 5 + 2£>i + 3b 2 

° ye = ° E (1 - b 2 )( 1 + b 2 - bi) 

Figure 5.44 Kingsbury filter - quantization after accumulator. 




2 2 2(2 + by + hi) + (1 + b 2 )( 1 + Z?! + bo) 1 ' 3 

a y e ~° E (1 - b 2 )(l + b 2 ~ bi) 

Figure 5.45 Zolzer filter - quantization after accumulator. 




Figure 5.46 SNR vs. cutoff frequency - quantization after accumulator ( f c < 200 Hz). 

Noise Shaping in Recursive Filters 

The analysis of the noise transfer function of different structures shows that for three 
structures with low roundoff noise a zero at z — 1 occurs in the transfer functions G(z) 
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Figure 5.47 SNR vs. cutoff frequency - quantization after accumulator (f c > 2 kHz). 



of the error signals in addition to the complex poles. This zero near the poles reduces 
the amplifying effect of the pole. If it is now possible to introduce another zero into the 
noise transfer function then the effect of the poles is compensated for to a larger extent. 
The procedure of feeding back the quantization error as shown in Chapter 2 produces an 
additional zero in the noise transfer function [Tra77, Cha78, Abu79, Bar82, Zol89]. The 
feedback of the quantization error is first demonstrated with the help of the direct-form 
structure as shown in Fig. 5.48. This generates a zero at z — 1 in the noise transfer function 
given by 



Glo(z) 



1 - z 1 

1 + b\z~ l + b 2 Z~ 2 



(5.113) 



The resulting variance o 2 of the quantization error at the output of the filter is presented 
in Fig. 5.48. In order to produce two zeros at z = 1, the quantization error is fed back 
over two delays weighted 2 and —1 (see Fig. 5.48b). The noise transfer function is, hence, 
given by 



G2.o(z) 



1 - 2z -1 + Z~ 2 

1 + b\z~ l + b 2 Z~ 2 



(5.114) 



The signal-to-noise ratio of the direct-form is plotted versus the cutoff frequency in 
Fig. 5.49. Even a single zero significantly improves the signal-to-noise ratio in the direct 
form. The coefficients b\ and hi approach —2 and 1 respectively with the decrease of 
the cutoff frequency. With this, the error is filtered with a second-order high-pass. The 
introduction of the additional zeros in the noise transfer function only affects the noise 
signal of the filter. The input signal is only affected by the transfer function H(z). If the 
feedback coefficients are chosen equal to the coefficients b\ and bi in the denominator 
polynomial, complex zeros are produced that are identical to the complex poles. The noise 
transfer function G(z) is then reduced to unity. The choice of complex zeros directly at the 
location of the complex poles corresponds to double-precision arithmetic. 

In [Abu79] an improvement of noise behavior for the direct form in any desired location 
of the z -plane is achieved by placing additional simple-to-implement complex zeros near 
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a) b) 

° DFl = ° E (l-b 2 )(l + b 2 -b l ) 

2 2 6 + 2/? j — 2Z?2 

°DF2=°E {l _ b2Xl+b2 : bi) 

Figure 5.48 Direct form with noise shaping. 
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Figure 5.49 SNR - Noise shaping in direct-form filter structures. 



f/Hz 



the poles. For implementing filter algorithms with digital signal processors, these kinds of 
suboptimal zero are easily realized. Since the Gold and Rader, Kingsbury, and Zolzer filter 
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structures already have zeros in their respective noise transfer functions, it is sufficient to 
use a simple feedback for the quantization error. By virtue of this extension, the block 
diagrams in Figs 5.50, 5.51 and 5.52 are obtained. 




Figure 5.50 Gold and Rader filter with noise shaping. 




2 2 (1 + *?)(( 1 + b 2 )( 6 - 2 b 2 ) + 2b\ + 8bi) + 2k\{\ + bi + b 2 ) 

a y e ~° E { i- b2){ i + b 2 -b l )(l + b 2 -b l ) 

Figure 5.51 Kingsbury filter with noise shaping. 

The effect of noise shaping on signal-to-noise ratio is shown in Figs 5.53 and 5.54. The 
almost ideal noise behavior of all filter structures for 16-bit quantization and very small 
cutoff frequencies can be observed. The effect of this noise shaping for increasing cutoff 
frequencies is shown in Fig. 5.54. The compensating effect of the two zeros at z = 1 is 
reduced. 
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2 _ 2 (1 + zf)((l + b 2 )( 6 - 2 b 2 ) + 2 bj + 8 bi) + 2zf(l + bi + b 2 ) 
a y e ~° E (i - b 2 ){\ + b 2 - b\)(l +b 2 - bi) 

Figure 5.52 Zolzer filter with noise shaping. 




Figure 5.53 SNR - noise shaping (20-200 Hz). 



Sealing 



In a fixed-point implementation of a digital filter, a transfer function from the input of the 
filter to a junction within the filter has to be determined, as well as the transfer function 
from the input to the output. By scaling the input signal, it has to be guaranteed that the 
signals remain within the number range at each junction and at the output. 

In order to calculate scaling coefficients, different criteria can be used. The L p norm is 
defined as 



L p = \\H\\ p = i- J n \H {e^)\PdQ. 



and an expression for the Loo norm follows for p — oo: 



Loo = 117/(^)1100 



max 

0<£2<7T 



( 5 . 116 ) 
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200 12k 

Figure 5.54 SNR - noise shaping (200-12000 Hz). 

The Loo norm represents the maximum of the amplitude frequency response. In general, 
the modulus of the output is 

Lv(»)| < \\H\\ p \\X\\ q (5.117) 

with 

- + - = 1, p,q> 1. (5.118) 

P q 

For the L\, L 2 and Loo norms the explanations in Table 5.10 can be used. 



Table 5.10 Commonly used scaling. 



p 


q 




1 


OO 


Given max. value of input spectrum 
scaling w.r.t. the L\ norm of H(e^) 


OO 


1 


Given Lj norm of input spectrum X (e^) 
scaling w.r.t. the L 0 0 norm of H(e^) 


2 


2 


Given Li norm of input spectrum X (e^) 
scaling w.r.t. the L 2 norm of H(e^) 



With 

|yi(n)l<l|ff«(e J ' n )llool|X(^' n )||i, (5.119) 

the Loo norm is given by 

OO 

Loo = ||«illoo = max \hi(k)\. 

k = 0 



(5.120) 
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For a sinusoidal input signal of amplitude 1 we get ||X(e- ,n )||i = 1. For |y, (n)| < 1 to be 
valid, the scaling factor must be chosen as 



1 

Hi(.eJ a ) || oo 



(5.121) 



The scaling of the input signal is carried out with the maximum of the amplitude frequency 
response with the goal that for \x(n)\ < 1, |y, (n)| < 1. As a scaling coefficient for the input 
signal the highest scaling factor 5/ is chosen. To determine the maximum of the transfer 
function 

l|//(^ /Q )||oo= max \H(e ja )\ (5.122) 

0<£2<7 t 

of a second-order system 



H(z) = 



ao + aiz 1 + a2Z 1 



1 + biz 1 + b 2 Z 1 

the maximum value can be calculated as 



aoz + a\z + a 2 
z 2 + biz + b 2 



ao 

aoU2 



ai 



a 2 



\H(e jn )\ 2 = 



, ai{ao + a 2 ) , (a 0 - a 2 ) 2 + a 2 

cos (12) H cos(12) + - 



2b 2 



4b 2 



bdl+b 2 ) ^ , (1 ~b 2 ) 2 + b 2 
cos*-(l2) -) cos(l2) + - 



= S 




\b 2 



k 

With x — cos(12) it follows that 

(. S 2 - a 0 )x 2 + (ft S 2 - ai)x + (P 2 S 2 - a 2 ) = 0. 



(5.123) 



(5.124) 



The solution of (5.124) leads to x — cos(12 max / m i n ) which must be real (— 1 < x < 1) for 
the maximum/minimum to occur at a real frequency. For a single solution (repeated roots) 
of the above quadratic equation, the discriminant must be D — (p/2) 2 — q — 0 (x 2 + px + 
q — 0). It follows that 



(ftS 2 -^) 2 f5 2 S 2 -u 2 

4 (S 2 — ao) 2 S 2 — ao 



(5.125) 



and 

- 40 2 ) + S 2 (4a 2 + 4a o 0 2 - 2ai0!) + (a 2 - 4a 0 a 2 ) = 0. (5.126) 

The solution of (5.126) gives two solutions for S 2 . The solution with the larger value is 
chosen. If the discriminant D is not greater than zero, the maximum lies at x = 1 (z = 1) 
or x = — 1 (z — — 1) as given by 



ao + ai + a 2 


(5.127) 


1+01+02 


ao — ai + a 2 


(5.128) 


1-01+02 
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Limit Cycles 

Limit cycles are periodic processes in a filter which can be measured as sinusoidal signals. 
They arise owing to the quantization of state variables. The different types of limit cycle 
and the methods necessary to prevent them are briefly listed below: 

• overflow limit cycles 
-> saturation curve 
— > scaling 

• limit cycles for vanishing input 
-» noise shaping 

-> dithering 

• limit cycles correlated with the input signal 
— > noise shaping 

— > dithering. 



5.3 Nonrecursive Audio Filters 

To implement linear phase audio filters, nonrecursive filters are used. The basis of an 
efficient implementation is the fast convolution 

y(n ) = x(n) * h(n) o-. Y{k) = X(k) ■ H(k), (5.129) 

where the convolution in the time domain is performed by transforming the signal and 
the impulse response into the frequency domain, multiplying the corresponding Fourier 
transforms and inverse Fourier transform of the product into the time domain signal (see 
Fig. 5.55). The transform is carried out by a discrete Fourier transform of length N, such 
that N — Ni + N 2 — 1 is valid and time-domain aliasing is avoided. First we discuss the 
basics. We then introduce the convolution of long sequences followed by a filter design for 
linear phase filters. 




Figure 5.55 Fast convolution of signal x(n) of length N\ and impulse response h(n) of length N2 
delivers the convolution result yin) = x(n) * h(n) of length N\ + N2 — )■ 







158 



Equalizers 



5.3.1 Basics of Fast Convolution 



IDFT Implementation with DFT Algorithm. The discrete Fourier transformation (DFT) 
is described by 



N - 1 

X(k) = J2 x(n)W$ = DFT,[x(n)], 

n = 0 

W N = e~ j2jt/N , 

and the inverse discrete Fourier transformation (IDFT) by 

1 N - 1 

*(«)=- £ X ^ W N nk - 
" k = 0 

Suppressing the scaling factor 1 /N, we write 

N - 1 

x\n) = J2 X(k)W~ nk = IDFT, ,[*(£)], 
k = 0 

so that the following symmetrical transformation algorithms hold: 

j N-i 

X'(k) X ^ W N’ 

v A „= o 

^ N - 1 

x(n)=—J2 X\k)W~ nk . 
v A *=o 



(5.130) 

(5.131) 



(5.132) 



(5.133) 



(5.134) 

(5.135) 



The IDFT differs from the DFT only by the sign in the exponential term. 

An alternative approach for calculating the IDFT with the help of a DFT is described 
as follows [Cad87, Duh88]. We will make use of the relationships 



x{n) — a{n) + j ■ b(n), 
j ■ x *(n ) = b{n) + j ■ a (n ) . 

Conjugating (5.133) gives 

x*(n) = X*{k)W’$. 

k—0 

The multiplication of (5.138) by j leads to 

N - 1 

j ■ x'*(n) =Y,j- X* (kW 1 * . 
k = 0 

Conjugating and multiplying (5.139) by j results in 



■ N - 1 



x\n) = j ■ 



Y J U-x*{k)w i 



nk 

N 



L k = 0 



(5.136) 

(5.137) 

(5.138) 

(5.139) 

(5.140) 



An interpretation of (5.137) and (5. 140) suggests the following way of performing the IDFT 
with the DFT algorithm: 
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1 . Exchange the real with the imaginary part of the spectral sequence 

Y(k) = Y,(k) + jY R (k). 

2. Transform with DFT algorithm 

DFT[T (k)] = >’/(«) + jy R (n). 

3. Exchange the real with the imaginary part of the time sequence 

y(n) = y R {n) + jyi{n). 

For implementation on a digital signal processor, the use of DFT saves memory for 1DFT. 

Discrete Fourier Transformation of Two Real Sequences. In many applications, stereo 
signals that consist of a left and right channel are processed. With the help of the DFT, both 
channels can be transformed simultaneously into the frequency domain [Sor87, E1182], 

For a real sequence x(n) we obtain 

X(k) = X*(-k), k = 0, 1, . . . , N- 1 (5.141) 

= X*(N-k). (5.142) 

For a discrete Fourier transformation of two real sequences x(n) and y{n), a complex 
sequence is first formed according to 

z(n) = x(n ) +jy(n). (5.143) 

The Fourier transformation gives 

DFT[z(n)] = DFT [x(n) + jy(n)] 

= Z R (k) + jZ,(k) (5.144) 

= Z(k), (5.145) 

where 

Z(k) = Z R (k) + jZ r (k) (5.146) 

= X R (k) + jX,(k) + j[Y R (k) + jY,(k )] (5.147) 

= X R (k) - Y[ (k) + j[Xj(k ) + F*(*)]. (5.148) 

Since x(n) and y(n) are real sequences, it follows from (5.142) that 

Z(N -k) = Z r (N - k) + jZi(N - k) = Z*(k ) 

= X R (k) - jXrlk) + j[Y R (k) - jYj(k)] 

= X R (k) + Y,(k) - j[X,(k) - Y R (k)]. 

Considering the real part of Z(k), adding (5.148) and (5.151) gives 

2 X R (k)= Z R (k) + Z r {N — k) (5.152) 

-* X R (k ) = \[Z R (k) + Z R (N - k)], (5.153) 



(5.149) 

(5.150) 

(5.151) 




160 



Equalizers 



and subtraction of (5.151) from (5.148) results in 



2 Yi(k)= Z R (N -k)- Z R (k) 

-> Yj(k) = \[Z R (N - k) - Z R (k)]. 

Considering the imaginary part of Z(k), adding (5.148) and (5.151) gives 

2 Yg(k)= Zt(k) + Z,(N - k) 

Y R (k) = \[Z,(k) + Z,{N — *)], 

and subtraction of (5.151) from (5.148) results in 

2 X I (k) = Z/(*r) — Zj(N -k) 

-+X I {k)=\[Z I (k)-Z I (N -k)]. 

Hence, the spectral functions are given by 

Xik) = DFT[x(n)] = X R (k) + jX t {k) 



= l -[Z R {k) + Z R {N - k)] 

+ j\[Z,{k) — Zj{N — k)], k = 0,1, 
Y(k ) = DFT[y(n)] = Y R (k) + jY R (k ) 

= l -[Z I {k) + Z,{N -k)] 



N 

~ 2 ’ 



+ j-[Z R (N 



k)-Z R (k)], k = 0,1,...,— 



and 



X R (k) + jX,(k) = X R (N- k ) - jX,(N - k) 
Y R (k) + jY,(k) = Y R (N -k)- jY,{N - k), 



N 

*=2 +1 Ar 



(5.154) 

(5.155) 

(5.156) 

(5.157) 



(5.158) 

(5.159) 



(5.160) 

(5.161) 

(5.162) 

(5.163) 

(5.164) 
1. (5.165) 



Fast Convolution if Spectral Functions are Known. The spectral functions X(k), Y(k) 
and H(k) are known. With the help of (5.148), the spectral sequence can be formed by 



Z(k) = Z R (k) + jZrik) 

— X R {k) — Y/(k) + j[Xi(k) + T^(k)], 



N 



( 5.166 ) 
1. (5.167) 



k = 0,1, 

Filtering is done by multiplication in the frequency domain: 

Z'(k) = [Z R (k) + jZ,(k)][H R (k) + jH,(k )] 

= Z R (k)H R (k) - Z I (k)H I (k) + j[Z R (k)H,(k) + Z!(k)H R (k)]. (5.168) 



The inverse transformation gives 

z'(n ) = [x(n) + jy(n)] * h{ri) — x{n) * h(n ) + jy(n ) * h(n) (5.169) 

= IDFT[Z'(k)] 

= z' R in ) +jz'[(n), 



(5.170) 




5.3 Nonrecursive Audio Filters 



161 



so that the filtered output sequence is given by 

x\n)=z' R {n), (5.171) 

y'(n) = z' I (n). (5.172) 

The filtering of a stereo signal can hence be done by transformation into the frequency 
domain, multiplication of the spectral functions and inverse transformation of left and right 
channels. 

5.3.2 Fast Convolution of Long Sequences 

The fast convolution of two real input sequences xi(n) and xz+i (n) of length N\ with the 
impulse response h(n) of length N 2 leads to the output sequences 

yi(n) — x/(n) * h(n), (5.173) 

yi+i(n) =xi+\ in) * h(n), (5.174) 

of length N\ + N 2 — 1 . The implementation of a nonrecursive filter with fast convolution 
becomes more efficient than the direct implementation of an FIR filter for filter lengths 
N > 30. Therefore the following procedure will be performed: 

• Formation of a complex sequence 

z{n) — xi(n) +jx l+l (n). (5.175) 

• Fourier transformation of the impulse response h(n ) that is padded with zeros to a 
length N > Ni + N 2 — 1, 

H(k) — DFT[/j(h)] (FFT-length AO. (5.176) 

• Fourier transformation of the sequence z(n) that is padded with zeros to a length 
N > Ni + N 2 - 1, 



Z(k) = DFT[z(n)] (FFT-length N). (5.177) 

• Formation of a complex output sequence 

e(n) = TDFT[Z(k)H(k)] (5.178) 

= z(n)*h(n) (5.179) 

= x/(n) * h(n ) + jx{ +l (n) * h(n). (5.180) 

• Formation of a real output sequence 

yi(n) — Re{e(n)}, (5.181) 

yi+i(n) — Im{e(n)}. (5.182) 
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Figure 5.56 Fast convolution with partitioning of the input signal x(n) into blocks of length L. 



For the convolution of an infinite-length input sequence (see Fig. 5.56) with an impulse 
response h(n), the input sequence is partitioned into sequences x m (n) of length L: 



x m (n) = 



x(n), 

0 , 



( m — 1 )L <n< mL — 1, 
otherwise. 



(5.183) 



The input sequence is given by superposition of finite-length sequences according to 



OO 

x(n) = ^2 x m (n). 

m= 1 



(5.184) 



The convolution of the input sequence with the impulse response h(n) of length M gives 



M - 1 

yin) — ^2 h(k)x(n — k) 
k=0 

M— 1 oo 

= ^2 ^ 2 x,n ( n - ® 

k = 0 m = 1 



oo 



E 

m = 1 



r M—l -i 

h (k)x m ( n — k) . 

- k = 0 



(5.185) 

(5.186) 

(5.187) 



The term in brackets corresponds to the convolution of a finite-length sequence x m {n) 
of length L with the impulse response of length M. The output signal can be given as 
superposition of convolution products of length L + M — 1 . With these partial convolution 
products 



)'m in) = 



' M—l 

h(k)x m (n — k ), 

k = 0 



0, 



(m — 1 )L <n< mL + M — 2, 
otherwise, 



(5.188) 



the output signal can be written as 

OO 

yin) = y m (n). 
m= 1 



(5.189) 
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h,(n) 


h 2 (n) 


.... 


h P (n) 



Figure 5.57 Partitioning of the impulse response h(n). 



If the length M of the impulse response is very long, it can be similarly partitioned into 
P parts each of length M/P (see Fig. 5.57). With 



hp\n - (p - 1)- 



M 



f M M 

I h(n), (p - l)— < n < p— - 1, 

[ 0, otherwise, 



(5.190) 



it follows that 



h(ri) — ^2 h p\ n ~ (P ~ !) 
p= l 



(5.191) 



With M p — pM/P and (5.189) the following partitioning can be done: 



OO rM- 1 

y(n) = y. y. h(k)x m (n - k ) 

m=\ L k=0 



ym(n ) 

OO rM\ — 1 M 2 — 1 

= 'Y2 ^2 h ^ Xm ® + XI - k) + ■ 

m = 1 L A:=0 k=M\ 



(5.192) 



M— 1 

I- h(k)x m (n — k) 

k=M P _i 

(5.193) 



This can be rewritten as 



00 r Mi — 1 M i — l 

>’(«) = ^ h](k)x m (n - k) + ^ h 2 (k)x m (n - M\ - k) 

m = 1 L 0 /c=0 



Jml 



y m 2 

Mi — 1 Mi-1 

+ ^ h'i (k)x m (n - 2M\ — k) + • • • + ^ hp(k)x m (n - (P - 1 )M\ - k) 



k=0 



k = 0 



Vm3 



y m p 



= Ebwi(») + y ) „2(» - Aft) H 1- }’m p(n - (P - l)Aft)]. 

^ v 

ymin) 



m = 1 



(5.194) 
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An example of partitioning the impulse response into P — 4 parts is graphically shown in 
Fig. 5.58. This leads to 

OO rM 1 — 1 Ml — 1 

y(n)= ^2 ^2 hi(k)x m (n — k) + ^ h 2 {k)x m (n - M\ - k) 

m = 1 k = 0 ^=0 




Figure 5.58 Scheme for a fast convolution with P = 4. 



The procedure of a fast convolution by partitioning the input sequence x(n) as well as the 
impulse response h(n ) is given in the following for the example in Fig. 5.58. 
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1. Decomposition of the impulse response h(n ) of length 4 M: 

h\(n)—h{n) 0<n<M—l , (5.196) 

Ii 2 (n — M) = h{n ) M <n < 2 M — 1, (5.197) 

h 3 (n -2M) = h(n) 2M<n<3M-\, (5.198) 

h 4 {n — 3M) = h{ri) 3M<n<4M — 1. (5.199) 



2. Zero-padding of partial impulse responses up to a length 2 M: 



Inin) = 


o' 


0 <n < M — 1 , 

M <n< 2 M - 1 , 


(5.200) 


h 2 (n) = 


[ h 2 (n), 

1 o, 


0 <n < M — 1 , 

M <n < 2 M - 1 , 


(5.201) 


in) = 


[ h 2 (n), 

1 o, 


0 <n < M — 1 , 

M <n < 2 M - 1 , 


(5.202) 


h^in) — 


[ luin), 

1 o- 


0 <n < M — 1 , 

M <n < 2 M - 1 . 


(5.203) 


Calculating and storing 


Hi (k) = DFT[/!,- (n)] 


1 , * = 1 , 


. . . , 4 (FFT-length 2M). 


(5.204) 



4. Decomposition of the input sequence x(n) into partial sequences x/(n) of length M: 
xi(n) — x(n), (1 — l)M <n < IM — 1, / = 1, . . . , oo. (5.205) 



5. Nesting partial sequences: 

Z m (n) — xi(n ) +jx l+l (n), m = 1, . . . , oo 

6. Zero-padding of complex sequence z m («) up to a length 2 M: 

Z m (n ) = 



Zm in), 
0 , 



(l- 1 )M <n<lM - 1, 
IM<n<(l+ \)M - 1. 



7. Fourier transformation of the complex sequences z m (>i)' 

Z m (k) = DFT[z m («)] = Z mR (k) + jZ mI (k) (FFT-length 2 M). 



(5.206) 



(5.207) 



(5.208) 



8. Multiplication in the frequency domain: 



[ Z R (k ) +yZ 7 (£)][ff«(fc) +y// 7 W] = Z R (k)H R (k ) - Z I (k)H I (k) 



E ml (k) = Z m (k)H l (k ) 


k = 0, 1, . . 


+ j[Z R (k)H,(k) + Z 7 (^)W«a)], (5.209) 
. , 2M — 1, (5.210) 


E m2 (k) = Z m (k)H 2 (k) 


k — 0, 1, . . 


. , 2M- 1, 


(5.211) 


E m i(k) = Z m (k)H 3 (k) 


k — 0, 1, . . 


. , 2M- 1, 


(5.212) 


E m4 (k) = Z m (k)H 4 (k) 


k — 0, 1, . . 


. , 2 AT - 1. 


(5.213) 
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9. Inverse transformation: 



e,ni (n) = IDFT[Z m (k)H\ (£)] n = 0, 1 , . . 


. , 2M- 1, 


(5.214) 


e m2 (n) = IDFT [Z m (k)H 2 (k)] n = 0, 1, . . 


. ,2M- 1 , 


(5.215) 


e m 3 (n) = IDFT[ Z m (k) H^(k)] n = 0, 1, . . 


. ,2M- 1 , 


(5.216) 


e m 4 (n) = IDFT[Z,„ (k) // 4 (k)] n = 0, 1, . . 


. ,2M- 1. 


(5.217) 


10. Determination of partial convolutions: 


Re{e„,i(«)} = */ * In , 




(5.218) 


Im{e„,i («)} =x/+i * hi. 




(5.219) 


R e{e m2 (n)} = x; *h 2 . 




(5.220) 


Im{e„, 2 (n)} =x/+i * h 2 , 




(5.221) 


Re{e m3 (n)} = xi */i 3 , 




(5.222) 


Im{e„, 3 (n)} = xi+\ * /? 3 , 




(5.223) 


Re{e m4 (n)} = xi * /i 4 , 




(5.224) 


Im{e m4 (n)} = x;+i * /? 4 . 




(5.225) 



11. Overlap-add of partial sequences, increment 1=1 + 2 and m = m + 1, and back to 
step 5. 

Based on the partitioning of the input signal and the impulse response and the following 
Fourier transform, the result of each single convolution is only available after a delay of 
one block of samples. Different methods to reduce computational complexity or overcome 
the block delay have been proposed [Soo90, Gar95, Ege96, M li 199, MiilOl, Garc02], These 
methods make use of a hybrid approach where the first part of the impulse response is used 
for time-domain convolution and the other parts are used for fast convolution in the fre- 
quency domain. Figure 5.59a,b demonstrates a simple derivation of the hybrid convolution 
scheme, which can be described by the decomposition of the transfer function as 

M - 1 

H(z)=J2z~ iN W’ (5-226) 

1=0 

where the impulse response has length M ■ N and M is the number of smaller partitions 
of length N . Figure 5.59c,d shows two different signal flow graphs for the decomposition 
given by (5.226) of the entire transfer function. In particular. Fig. 5.59d highlights (with 

gray background) that in each branch / = 1 M — 1 a delay of i ■ N occurs and each 

filter Hi(z) has the same length and makes use of the same state variables. This means 
that they can be computed in parallel in the frequency domain with 2 N - FFT s/IFFTs and 
the outputs have to be delayed according to (i — 1) • N, as shown in Fig. 5.59e. A further 
simplification shown in Fig. 5.59f leads to one input 2/V-FFT and block delays z^ 1 for 
the frequency vectors. Then, parallel multiplications with H+k) of length 2 N and the 
summation of all intermediate products are performed before one output 2A-IFFT for the 
overlap-add operation in the time domain is used. The first part of the impulse response 
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represented by flo(z') is performed by direct convolution in the time domain. The frequency 
and time domain parts are then overlapped and added. An alternative realization for fast 
convolution is based on the overlap and save operation. 




Figure 5.59 Hybrid fast convolution. 



5.3.3 Filter Design by Frequency Sampling 

Audio filter design for nonrecursive filter realizations by fast convolution can be carried out 
by the frequency sampling method. For linear phase systems we obtain 

H(e jQ ) = A(e JQ ) e ~J( N F-l/2)n^ (5.227) 

where A (e^ Q ) is a real-valued amplitude response and Nf is the length of the impulse 
response. The magnitude |//(e- ,Q )| is calculated by sampling in the frequency domain at 
equidistant places 

-^- = — , It = 0,1 Np- 1, (5.228) 

fs n f 

according to 

Nf 

\H(e jn )\ — A(e j2nk ' NF ), k = 0,1,...,-^-!. (5.229) 
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Hence, a filter can be designed by fulfilling conditions in the frequency domain. The linear 
phase is determined as 



e -j%±Q. = e -j2n(N F -l/2)k/N F 



= cos ( 2i r 



Nf — 1 k 



Np 



— j sin I 2n 



N F - 1 k 



Nf J’ 



Np 

k = 0, 1, . . . , — — 
2 



1 . 



(5.230) 

(5.231) 



Owing to the real transfer function II (z) for an even filter length, we have to fulfill 



H 




= 0a H(k) = H*(N f -k). 



k — 0, 1, . 




(5.232) 



This has to be taken into consideration while designing filters of even length Nf- The 
impulse response h(n ) is obtained through an AT -point IDFT of the spectral sequence 
H(k). This impulse response is extended with zero-padding to the length N and then 
transformed by an N -point DFT resulting in the spectral sequence H(k) of the filter. 

Example: For Np = 8, H (k) \ = I (k — 0, 1, 2, .... 7) and | // (4) | = 0, the group delay 
is to — 3.5. Figure 5.60 shows the amplitude, real part and imaginary part of the transfer 
function and the impulse response h(n). 



5.4 Multi-complementary Filter Bank 

The subband processing of audio signals is mainly used in source coding applications for 
efficient transmission and storing. The basis for the subband decomposition is critically 
sampled filter banks [Vai93, FliOO] . These hlter banks allow a perfect reconstruction of 
the input provided there is no processing within the subbands. They consist of an analysis 
hlter bank for decomposing the signal in critically sampled subbands and a synthesis hlter 
bank for reconstructing the broad-band output. The aliasing in the subbands is eliminated 
by the synthesis hlter bank. Nonlinear methods are used for coding the subband signals. 
The reconstruction error of the hlter bank is negligible compared with the errors due to the 
coding/decoding process. Using a critically sampled hlter bank as a multi-band equalizer, 
multi-band dynamic range control or multi-band room simulation, the processing in the 
subbands leads to aliasing at the output. In order to avoid aliasing, a multi-complementary 
hlter bank [Fli92, Z6192, FliOO] is presented which enables an aliasing-free processing in 
the subbands and leads to a perfect reconstruction of the output. It allows a decomposition 
into octave frequency bands which are matched to the human ear. 

5.4.1 Principles 

Figure 5.61 shows an octave-band hlter bank with critical sampling. It performs a suc- 
cessive low-pass/high-pass decomposition into half-bands followed by downsampling by a 
factor 2. The decomposition leads to the subbands Y i to Ky (see Fig. 5.62). The transition 
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k- 

Impulse response h(n) 




n - 



Re{H(k)} 

1 

0.5 ■ 



P0.5 
PI ■ 
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k- 




0 2 4 6 8 

k- 



Figure 5.60 Filter design by frequency sampling (Np even). 



frequencies of this decomposition are given by 

Sl Ck = 7 ^ 2~ k+l , k=l,2,...,N-l. (5.233) 

In order to avoid aliasing in subbands, a modified octave-band filter bank is considered 
which is shown in Fig. 5.63 for a two-band decomposition. The cutoff frequency of the 
modified filter bank is moved from ^ to a lower frequency. This means that in downsam- 
pling the low-pass branch, no aliasing occurs in the transition band (e.g. cutoff frequency 
T). The broader high-pass branch cannot be downsampled. A continuation of the two-band 
decomposition described leads to the modified octave-band filter bank shown in Fig. 5.64. 
The frequency bands are depicted in Fig. 5.65 showing that besides the cutoff frequencies 

= y 2~ k+1 , k= 1, 2, . . . , N- 1, (5.234) 

the bandwidth of the subbands is reduced by a factor 2. The high-pass subband Y\ is an 
exception. 

The special low-pass/high-pass decomposition is carried out by a two-band comple- 
mentary filter bank as shown in Fig. 5.66. The frequency responses of a decimation filter 
Hd(z), interpolation filter /// (-) and kernel filter II k(z) are shown in Fig. 5.67. 
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Figure 5.61 Octave-band QMF filter bank (SP = signal processing, LP = low-pass, HP = high-pass). 





Figure 5.64 Modified octave-band filter bank. 



The low-pass filtering of a signal x\ (n) is done with the help of a decimation filter 
Hd(z), the downsampler of factor 2 and the kernel filter Hk(z) and leads to y 2 ( 2n ) . The 
Z-transform of y 2 (2n) is given by 



Y 2 (z) = ^[H D (zbXi(zbn K (z) + H D (-zbXi(-zbH K (z)]. (5.235) 
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Figure 5.65 Modified octave decomposition. 




Figure 5.66 Two-band complementary filter bank. 




Figure 5.67 Design of Hd(z). Hj{z) and H^iz). 



The interpolated low -pass signal yn(n) is generated by upsampling by a factor 2 and 
filtering with the interpolation filter Hi(z). The Z-transform of y\ / in) is given by 

Y 1 l(z) = Y2(z 2 )H i (z) (5.236) 

= 1 1 H d (z)H i (z)H k (z 2 ) Xrjz) + ^H d (-z)H i (z)H k (z 2 ) Xji-z). 

Gitz) G 2 (z) 

(5.237) 

The high-pass signal yi(n) is obtained by subtracting the interpolated low-pass signal 
yniri) from the delayed input signal x\(n — D). The Z-transform of the high-pass signal 
is given by 

Y l (z) = z- D X l (z)-Y lL {z) (5.238) 

= [ z~ D - Gi(z)]Xi(z) - G 2 (z)X l (-z). (5.239) 

The low-pass and high-pass signals are processed individually. The output signal x\(n) 

is formed by adding the high-pass signal to the upsampled and filtered low-pass signal. 
With (5.237) and (5.239) the Z-transform of x\ (n) can be written as 

X\(z) = Y 1L (z ) + hi (z) = z~ D Xi(z). (5.240) 

Equation (5.240) shows the perfect reconstruction of the input signal which is delayed by 
D sampling units. 
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The extension to N subbands and performing the kernel filter using complementary 
techniques [Ram88, Ram90] leads to the multi-complementary filter bank as shown in 
Fig. 5.68. Delays are integrated in the high-pass (Fi) and band-pass subbands (Yj to Y^-i) 
in order to compensate the group delay. The filter structure consists of N horizontal stages. 
The kernel filter is implemented as a complementary filter in S vertical stages. The design 
of the latter will be discussed later. The vertical delays in the extended kernel filters (EKFi 
to EKF;v_i) compensate group delays caused by forming the complementary component. 
At the end of each of these vertical stages is the kernel filter Hk ■ With 

Zk = Z and k = 1, . . . , N, 

the signals Xk(zk ) can be written as a function of the signals Xk(zk ) as 

X = diag [z^° l Z 2 -° 2 ... z~° N ]X, 

with 

X=[Xi(zi) X 2 (z2) Xn(zn)] T , 

X=[Xi(zi) X 2 (z 2 ) Xn(zn)] t , 

and with k = N — l the delays are given by 

Dk=N — 0 , 

Dk=N-l = 2ZTv_m_i + D, 1=1,.. N - l. 



(5.241) 

(5.242) 

(5.243) 

(5.244) 



Perfect reconstruction of the input signal can be achieved if the horizontal delays Duk are 
given by 



^H k=N — 0 , 

DH k=N - 1 = 0 ’ 

D Hk=N .i — 2D/v_/ + i , / = 2, . . . , N - 1 . 

The implementation of the extended vertical kernel filters is done by calculating com- 
plementary components as shown in Fig. 5.69. After upsampling, interpolating with a 
high-pass HP (Fig. 5.69b) and forming the complementary component, the kernel filter 
Hk with frequency response as in Fig. 5.69a becomes low-pass with frequency response 
as illustrated in Fig. 5.69c. The slope of the filter characteristic remains constant whereas 
the cutoff frequency is doubled. A subsequent upsampling with an interpolation high-pass 
(Fig. 5.69d) and complement filtering leads to the frequency response in Fig. 5.69e. With 
the help of this technique, the kernel filter is implemented at a reduced sampling rate. The 
cutoff frequency is moved to a desired cutoff frequency by using decimation/interpolation 
stages with complement filtering. 

Computational Complexity. For an /V-hand multi-complementary filter bank with N — 
1 decomposition filters where each is implemented by a kernel filter with S stages, the 
horizontal complexity is given by 



(l 1 

HC=HC i +HC 2 [ - + - 



1 

w 



(5.245) 
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EKF 1 EKF 2 ekf N-1 




HS 1 HS2 HS N-1 ! HS N 

Horizontal Stages 

Figure 5.68 Multi-complementary filter bank. 



HC i denotes the number of operations that are carried out at the input sampling rate. These 
operations occur in the horizontal stage HS \ (see Fig. 5.68). HC2 denotes the number 
of operations (horizontal stage HS2) that are performed at half of the sampling rate. The 
number of operations in the stages from HS2 to HS n are approximately identical but are 
calculated at sampling rates that are successively halved. 

The complexities VC\ to FC’ v - 1 of the vertical kernel filters EKF \ to EKFtt- 1 are 
calculated as 

1 /II 1 \ 

yc I = -y 1 + y 2 (- + - + ... + ^ TI j, 

1 /11 1 \ 1 

yC2 =4 yi + y2 U + T6 + "- + H = 2 yCl ’ 
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H(eJii) 



H(eJii) 



|H(ej£2)| 



|H(eln)| 



|H(eJ£2)| 



Figure 5.69 Multirate complementary filter. 

1 /II 1 \ 1 

VCs = 8 Vl + V2 (l6 + 32 + " ' + 2^+3 ) = 4 VCl ’ 

VC N -i = ^T v i + V2 (^n + • • • + 2 s+n-i) = ^T VCl ’ 

where Vi depicts the complexity of the first stage VS \ and Vi is the complexity of the 
second stage VSi (see Fig. 5.68). It can be seen that the total vertical complexity is given 

by 

VC=VC^ 1 + I + I + - - - + ^LA (5.246) 

The upper bound of the total complexity results is the sum of horizontal and vertical 
complexities and can be written as 

Ctot = HC\ + HCi + 2 VCi . (5.247) 





The total complexity Ctot is independent of the number of frequency bands N and vertical 
stages S. This means that for real-time implementation with finite computation power, any 
desired number of subbands with arbitrarily narrow transition bands can be implemented! 
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5.4.2 Example: Eight-band Multi-complementary Filter Bank 

In order to implement the frequency decomposition into the eight bands shown in Fig. 5.70, 
the multirate filter structure of Fig. 5.71 is employed. The individual parts of the system 
provide means of downsampling (D = decimation), upsampling (I = interpolation), kernel 
filtering (K), signal processing (SP), delays (Ni = Delay 1, N 2 = Delay 2) and group delay 
compensation M,- in the ith band. The frequency decomposition is carried out successively 
from the highest to the lowest frequency band. In the two lowest frequency bands, a com- 
pensation for group delay is not required. The slope of the filter response can be adjusted 
with the kernel complementary filter structure shown in Fig. 5.72 which consists of one 
stage. The specifications of an eight-band equalizer are listed in Table 5.1 1. The stop-band 
attenuation of the subband filters is chosen to be 100 dB. 




fez k;3 fc 2 fci fs/4 fs/2 



Figure 5.70 Modified octave decomposition of the frequency band. 



Table 5.11 Transition frequencies fa and transition bandwidths TB in an eight-band equalizer. 



fs [kHz] 


fc 1 [Hz] 


f C 2 [Hz] 


fc3 [Hz] 


f C 4 [Hz] 


fc 5 [Hz] 


fee [Hz] 


fd [Hz] 


44.1 


7350 


3675 


1837.5 


918.75 


459.375 


^230 


R*115 


TB [Hz] 


1280 


640 


320 


160 


80 


40 


20 



Filter Design 

The design of different decimation and interpolation filters is mainly determined by the 
transition bandwidth and the stop-band attenuation for the lower frequency band. As an 
example, a design is made for an eight-band equalizer. The kernel complementary filter 
structure for both lower frequency bands is illustrated in Fig. 5.72. The design specifications 
for the kernel low-pass, decimation and interpolation filters are presented in Fig. 5.73. 

Kernel Filter Design. The transition bandwidth of the kernel filter is known if the transition 
bandwidth is given for the lower frequency band. This kernel filter must be designed for 
a sampling rate of fg = 44100/(2 8 ). For a given transition bandwidth fj b at a frequency 
f" = fg/ 3, the normalized pass-band frequency is 



ton f" ~ M 2 

27 r /'' 

and the normalized stop-band frequency 

V'sh f" + / tb/2 
2tt /" 



(5.248) 



(5.249) 
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x(n) y(n) 




Figure 5.72 Kernel complementary filter structure. 



With the help of these parameters the filter can be designed. Making use of the Parks- 
McClellan program, the frequency response shown in Fig. 5.74 is obtained for a transi- 
tion bandwidth of fjs = 20 Hz. The necessary filter length for a stop-band attenuation of 
100 dB is 53 taps. 



Decimation and Interpolation High-pass Filter. These filters are designed for a sampling 
rate of f' s — 44100/ (2 7 ) and are half-band filters as illustrated in Fig. 5.73. First a low-pass 
filter is designed, followed by a high-pass to low-pass transformation. For a given transition 
bandwidth fr R, the normalized pass-band frequency is 



f" + M 2 

2 * f's 



and the normalized stop-band frequency is given by 



(5.250) 



n’sb 2 f" ~ Ml 

2n f's 



(5.251) 



With these parameters the design of a half-band filter is carried out. Figure 5.75 shows the 
frequency response. The necessary filter length for a stop-band attenuation of 100 dB is 
55 taps. 



Decimation and Interpolation Low-pass Filter. These filters are designed for a sampling 
rate of fs = 44100/ (2 6 ) and are also half-band filters. For a given transition bandwidth 
frB , the normalized pass-band frequency is 



Qp h 2 f" + frs/2 
2n f s 

and the normalized stop-band frequency is given by 

Cia, 4 f" - frs/2 

2n- f s 



(5.252) 



(5.253) 
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Figure 5.73 Decimation and interpolation filters. 



With these parameters the design of a half-band filter is carried out. Figure 5.76 shows the 
frequency response. The necessary filter length for a stop-band attenuation of 100 dB is 
43 taps. These filter designs are used in every decomposition stage so that the transition 
frequencies and bandwidths are obtained as listed in Table 5.11. 

Memory Requirements and Latency Time. The memory requirements depend directly 
on the transition bandwidth and the stop-band attenuation. Here, the memory operations 
for the actual kernel, decimation and interpolation filters have to be differentiated from the 
group delay compensations in the frequency bands. The compensating group delay N\ for 
decimation and interpolation high-pass filters of order Odhp/ihp is calculated with the help 
of the kernel filter order Okf according to 



Ni — Okf + Odhp/ihp- 



(5.254) 
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Kernel Filter N=53 




Figure 5.74 Kernel low-pass filter with a transition bandwidth of 20 Hz. 



Interpolation/Decimation Filter 1 N=55 




Figure 5.75 Decimation and interpolation high-pass filter. 



The group delay compensation Ni for the decimation and interpolation low-pass filters of 
order Odlp/ilp is given by 



Ni — 2Ni + Odlp/ilp- 



(5.255) 
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Interpolation/Decimation Filter 2 N=43 




Figure 5.76 Decimation and interpolation low-pass filter. 



The delays M 3 , . . . , Mg in the individual frequency bands are calculated recursively start- 
ing from the two lowest frequency bands: 

M 3 = 2 AO, M 4 = 6N 2 , M 5 = 14/V 2 , 

M 6 = 30A0, M 7 = 62A?2, M 8 = 126A?2. 

The memory requirements per decomposition stage are listed in Table 5.12. The memory 
for the delays can be computed by JT M; = 240 /V 2 . The latency time (delay) is given by 
t D = (Mg/44100)10 3 ms (to = 725 ms). 



Table 5.12 Memory requirements. 



Kernel filter 


Ok¥ 


DHP/IHP 


2 • Odhp/ihp 


DLP/ILP 


3 • Odlp/ilp 


Ni 


<?KF + 0DHP/IHP 


n 2 


2 • /V| + Odlp/ilp 



5.5 Java Applet - Audio Filters 

The applet shown in Fig. 5.77 demonstrates audio filters. It is designed for a first insight into 
the perceptual effect of filtering an audio signal. Besides the different filter types and their 
acoustical effect, the applet offers a first insight into the logarithmic behavior of loudness 
and frequency resolution of our human acoustical perception. 
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The following filter functions can be selected on the lower right of the graphical user 
interface: 

• Low-/high-pass filter (LP/HP) with control parameter 

- cutoff frequency f c in hertz (lower horizontal slider) 

- all frequencies above (LP) or below (HP) the cutoff frequency are attenuated 
according to the shown frequency response. 

• Low/high-frequency shelving filter (LFS/HFS) with control parameters 

- cutoff frequency f c in hertz (lower horizontal slider) 

- boost/cut in dB (left vertical slider with + for boost or — for cut) 

- all frequencies below (LFS) or above (HFS) the cutoff frequency are boosted/ 
cut according to the selected boost/cut. 

• Peak filter with control parameters 

- center frequency f c in hertz (lower horizontal slider) 

- boost/cut in dB (left vertical slider with + for boost or — for cut) 

- 0-factor Q — f c / fb (right vertical slider), which controls the bandwidth fb of 
the boost/cut around the adjusted center frequency f c . Lower Q-fac tor means 
wider bandwidth. 

- the peak filter boosts/cuts the center frequency with a bandwidth adjusted by 
the Q -factor. 

The center window shows the frequency response (filter gain versus frequency) of the 
selected filter functions. You can choose between a linear and a logarithmic frequency 
axis. 

You can choose between two predefined audio files from our web server ( ciudiol.wav 
or audio2.wav ) or your own local wav file to be processed [Gui05], 

5.6 Exercises 

1. Design of Recursive Audio Filters 

1. How can we design a low-frequency shelving filter? Which parameters define the 
filter? Explain the control parameters. 

2. How can we derive a high-frequency shelving filter? Which parameters define the 
filter? 

3. What is the difference between first- and second-order shelving filters. 

4. How can we design a peak filter? Which parameters define the filter? What is the 
filter order? Explain the control parameters. Explain the Q -factor. 

5. How do we derive the digital transfer function? 

6. Derive the digital transfer functions for the first-order shelving filters. 
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Ci Linear Scale Bypass Filter | PeakFilter 

(c) 2004 ANT. Helmut- Schmidt- University Hamburg. Germany 



Figure 5.77 Java applet - audio filters. 



2. Parametric Audio Filters 

1 . What is the basic idea of parametric filters? 

2. What is the difference between the Regalia and the Zolzer filter structures? Count 
the number of multiplications and additions for both filter structures. 

3. Derive a signal flow graph for first- and second-order parametric Zolzer filters with 
a direct-form implementation of the all-pass filters. 

4. Is there a complete decoupling of all control parameters for boost and cut? Which 
parameters are decoupled? 

3. Shelving Filter: Direct Form 

Derive a first-order low shelving filter from a purely band-limiting first-order low-pass 
filter. Use a bilinear transform and give the transfer function of the low shelving filter. 

1. Write down what you know about the filter coefficients and calculate the poles/zeros 
as functions of Vo and T. What gain factor do you have if z = ± 1 ? 

2. What is the difference between purely band-limiting filters and the shelving filter? 

3. How can you describe the boost and cut effect related to poles/zeros of the filter? 

4. How do we get a transfer function for the boost case from the cut case? 

5. How do we go from a low shelving filter to a high shelving filter? 
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4. Shelving Filter: All-pass Form 

Implement a first-order high shelving filter for the boost and cut cases with the sampling 
rate fs — 44.1 kHz, cutoff frequency f c = 10 kHz and gain G — 12 dB. 

1. Define the all-pass parameters and coefficients for the boost and cut cases. 

2. Derive from the all-pass decomposition the complete transfer function of the shelving 
filter. 

3. Using Matlab, give the magnitude frequency response for boost and cut. Show the 
result for the case where a boost and cut filter are in a series connection. 

4. If the input signal to the system is a unit impulse, give the spectrum of the input and 
out signal for the boost and cut cases. What result do you expect in this case when 
boost and cut are again cascaded? 



5. Quantization of Filter Coefficients 

For the quantization of the filter coefficients different methods have been proposed: direct 
form, Gold and Rader, Kingsbury and Zolzer. 

1. What is the motivation behind this? 

2. Plot a pole distribution using the quantized polar representation of a second-order 
HR filter 



i o — 1 i ? —2 ’ 

1—2 r cos <pz + r z z 

6. Signal Quantization inside the Audio Filter 

Now we combine coefficient and signal quantization. 

1. Design a digital high-pass filter (second-order HR), with a cutoff frequency f c — 
50 Hz. (Use the Butterworth, Chebyshev or elliptic design methods implemented in 
Matlab.) 

2. Quantize the signal only when it leaves the accumulator (i.e. before it is saved in any 
state variable). 

3. Now quantize the coefficients (direct form), too. 

4. Extend your quantization to every arithmetic operation (i.e. after each addition/ 
multiplication). 
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7. Quantization Effects in Recursive Audio Filters 

1 . Why is the quantization of signals inside a recursive filter of special interest? 

2. Derive the noise transfer function of the second-order direct-form filter. Apply a first- 
and second-order noise shaping to the quantizer inside the direct-form structure and 
discuss its influence. What is the difference between second-order noise shaping and 
double-precision arithmetic ? 

3. Write a Matlab implementation of a second-order filter structure for quantization and 
noise shaping. 

8. Fast Convolution 

For an input sequence x(n) of length N\ = 500 and the impulse response h(n) of length 

A?2 — 31, perform the discrete-time convolution. 

1 . Give the discrete-time convolution sum formula. 

2. Using Matlab, define x{n) as a sum of two sinusoids and derive h(n ) with Matlab 
function remez ( . . ) . 

3. Realize the filter operation with Matlab using: 

• the function conv (x, h) 

• the sample-by-sample convolution sum method 

• the FFT method 

• the FFT with overlap-add method. 

4. Describe FIR filtering with the fast convolution technique. What conditions do the 
input signal and the impulse responses have to fulfill if convolution is performed by 
equivalent frequency-domain processing? 

5. What happens if input signal and impulse response are as long as the FFT transform 
length? 

6. How can we perform the IFFT by the FFT algorithm? 

7. Explain the processing steps 

• for a segmentation of the input signal into blocks and fast convolution; 

• for a stereo signal by the fast convolution technique; 

• for the segmentation of the impulse response. 

8. What is the processing delay of the fast convolution technique? 

9. Write a Matlab program for fast convolution. 

10. How does quantization of the signal influence the roundoff noise behavior of an FIR 
filter? 
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9. FIR Filter Design by Frequency Sampling 

1. Why is frequency sampling an important design method for audio equalizers? How 
do we sample magnitude and phase response? 

2. What is the linear phase frequency response of a system? What is the effect on an 
input signal passing through such a system? 

3. Explain the derivation of the magnitude and phase response for a linear phase FIR 
filter. 

4. What is the condition for a real-valued impulse response of even length N ? What is 
the group delay? 

5. Write a Matlab program for the design of an FIR filter and verify the example in the 
book. 

6. If the desired frequency response is an ideal low-pass filter of length N f = 31 with 
cutoff frequency £2 C = tc / 2, derive the impulse response of this system. What will 
the result be for Nf — 32 and £2 C = ;r? 

10. Multi-complementary Filter Bank 

1 . What is an octave-spaced frequency splitting and how can we design a filter bank for 
that task? 

2. How can we perform aliasing-free subband processing? How can we achieve narrow 
transition bands for a filter bank? What is the computational complexity of an octave- 
spaced filter bank? 
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Chapter 6 

Room Simulation 



Room simulation artificially reproduces the acoustics of a room. The foundations of room 
acoustics are found in [Cre78, Kut91]. Room simulation is mainly used for post-processing 
signals in which a microphone is located in the vicinity of an instrument or a voice. The 
direct signal, without additional room impression, is mapped to a certain acoustical room, 
for example a concert hall or a church. In terms of signal processing, the post-processing 
of an audio signal with room simulation corresponds to the convolution of the audio signal 
with a room impulse response. 



6.1 Basics 



6.1.1 Room Acoustics 



The room impulse response between two points in a room can be classified as shown in 
Fig. 6.1. The impulse response consists of the direct signal, early reflections (from walls) 
and subsequent reverberation. The number of early reflections continuously increases with 
time and leads to a random signal with exponential decay called subsequent reverberation. 
The reverberation time (decrease in sound pressure level by 60 dB) can be calculated, using 
the geometry of the room and the partial areas that absorb sound in the room, from 



7(50 
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( 6 . 1 ) 



where 7(,o is the reverberation time (in s), V the volume of the room (m 3 ), S„ the partial 
areas (m 2 ) and a„ the absorption coefficient of partial area S n . 

The geometry of the room also determines the eigenfrequencies of a three-dimensional 
rectangular room: 




(6.2) 



where n x , n y , n z are integer number of half waves (0, 1,2,.. .), l x , l y , l z are dimensions 
of a rectangular room, and c is the velocity of sound. 
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Figure 6.1 Room impulse response h(n) and simplified decomposition into direct signal, early 
reflections and subsequent reverberation (with |/7(n)|). 



For larger rooms, the eigenfrequencies start from very low frequencies. In contrast, the 
lowest eigenfrequencies of smaller rooms are shifted toward higher frequencies. The mean 
frequency between two extrema of the frequency response of a large room is approximately 
inversely proportional to the reverberation time [Schr87] : 

A/ - 1/760. (6.3) 

The distance between two eigenfrequencies decreases with increasing number of half 
waves. Above a critical frequency 



fc > 4000V760/V, (6.4) 

the density of eigenfrequencies becomes so large that they overlap each other [Schr87], 



6.1.2 Model-based Room Impulse Responses 

The methods for analytically determining a room impulse response are based on the ray 
tracing model [Schr70] or image model [ A1179] . In the case of the ray tracing model, a 
point source with radial emission is assumed. The path length of rays and the absorption 
coefficients of walls, roofs and floors are used to determine the room impulse response (see 
Fig. 6.2). For the image model, image rooms with secondary image sources are formed 
which in turn have further image rooms and image sources. The summation of all image 
sources with corresponding delays and attenuations provides the estimated room impulse 
response. Both methods are applied in room acoustics to get insight into the acoustical 
properties when planning concert halls, theaters, etc. 
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a) Ray Tracing 




b) Virtual Image Sources 




Figure 6.2 Model-based methods for calculating room impulse responses. 



6.1.3 Measurement of Room Impulse Responses 

The direct measurement of a room impulse response is carried out by impulse excitation. 
Better measurement results are obtained by correlation measurement of room impulse 
responses by using pseudo-random sequences as the excitation signal. Pseudo-random 
sequences can be generated by feedback shift registers [Mac76]. The pseudo-random se- 
quence is periodic with period L = 2 N — 1, where N is the number of states of the shift 
register. The autocorrelation function (ACF) of such a random sequence is given by 



rxx(n) = 




L ’ 



n = 0, L,2L, 
elsewhere, 



(6.5) 



where a is the maximum value of the pseudo-random sequence. The ACF also has a 
period L. After going through a DA converter, the pseudo-random signal is fed through 
a loudspeaker into a room (see Fig. 6.3). 




Figure 6.3 Measurement of room impulse response with pseudo-random signal x(t). 

At the same time, the pseudo-random signal and the room signal captured by a mi- 
crophone are recorded on a personal computer. The impulse response is obtained with the 
cyclic cross-correlation 

rxY(n) = rxx(n) * h(n) & h(n). (6.6) 

For the measurement of room impulse responses it has to be borne in mind that the periodic 
length of the pseudo-random sequence must be longer than the length of the room impulse 
response. Otherwise, aliasing in the periodic cross-correlation rxy(n) (see Fig. 6.4) occurs. 
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Figure 6.4 Periodic autocorrelation of pseudo-random sequence and periodic cross-correlation. 



To improve the signal-to-noise ratio of the measurement, the average of several periods of 
the cross-correlation is calculated. 

Sine sweep measurements [FarOO, MiilOl, Sta02] are based on a chirp signal xs(t) of 
length T( and an inverse signal xs im (t), where both satisfy the condition 

xs(t)*xs itty (t) = 8(t-T c ). (6.7) 

The chirp signal can be applied to the room by a loudspeaker and then a signal inside the 
room y(t) — xs(t) * h(n ) is recorded. By performing the convolution of the received signal 
y(t) with the inverse signal xs im (t) one obtains the impulse response from 

y(f) * x Sim (t) = xs(t) * h(n ) * x Sim (t ) = h(t - T c ). (6.8) 



6.1.4 Simulation of Room Impulse Responses 

The methods just described provide means for calculating the impulse response from the 
geometry of a room and for measuring the impulse response of a real room. The reproduc- 
tion of such an impulse response is basically possible with the help of the fast convolution 
method as described in Chapter 5. The ear signals at a listening position inside the room 
are computed by 



N- 1 

VL (n) = y2 x(k) ■ h L (n - k). 


(6.9) 


k= 0 




N-l 




yR(n) = ^2 x(k) ■ h R (n - k), 
k= 0 


(6.10) 



where h[f n) and h R (n) are the measured impulse responses between the source inside the 
room, which generates the signal x(n), and a dummy head with two ear microphones. 
Special implementations of fast convolution with low latency are described in [Soo90, 
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Gar95, Rei95, Ege96, Miil99, JohOO, MiilOl, Garc02] and a hybrid approach based on 
convolution and recursive filters can be found in [BroOl]. Investigations regarding fast 
convolution with sparse psychoacoustic-based room impulse responses are discussed in 
[Iid95, Lee03a, Lee03b], 

In the following sections we will consider special approaches for early reflections and 
subsequent reverberation, which allow a parametric adjustment of all relevant parameters 
of a room impulse response. With this approach an accurate room impulse response is 
not possible, but with a moderate computational complexity a satisfying solution from an 
acoustic point of view can be achieved. In Section 6.4 an efficient implementation of the 
convolutions (6.9) and (6.10) with a multirate signal processing approach [Zol90, Sch92, 
Sch93, Sch94] is discussed. 

6.2 Early Reflections 

Early reflections decisively affect room perception. Spatial impression is produced by 
early reflections which reach the listener laterally. The significance of lateral reflections 
in creating spatial impression was investigated by Barron [Bar71, Bar81]. Fundamental 
investigations of concert halls and their different acoustics are described by Ando [And90]. 

6.2.1 Ando’s Investigations 

The results of the investigations by Ando are summarized in the following: 

• Preferred delay time of a single reflection: with the ACF of the signal, the delay is 
determined from |r xc (Afi)| = 0.1 • r xx ( 0). 

• Preferred direction of a single reflection: ±(55° ± 20°). 

• Preferred amplitude of a single reflection: Ai — ±5 dB. 

• Preferred spectrum of a single reflection: no spectral shaping. 

• Preferred delay time of a second reflection: A ?2 = 1.8 • A t\. 

• Preferred reverberation time: 7js o = 23 • A t\. 

These results show that in terms of perception, a preferred pattern of reflections as well 
as the reverberation time depend decisively on the audio signal. Hence, for different audio 
signals like classical music, pop music, speech or musical instruments entirely different 
requirements for early reflections and reverberation time have to be considered. 

6.2.2 Gerzon Algorithm 

The commonly used method of simulating early reflections is shown in Figs 6.5 and 6.6. 
The signal is weighted and fed into a system generating early reflections, followed by an 
addition to the input signal. The first M reflections are implemented by reading samples 
from a delay line and weighting these samples with a corresponding factor gj (see Fig. 6.6). 
The design of a system for simulating early reflections will now be described as proposed 
by Gerzon [Ger92] . 
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Figure 6.5 Simulation of early reflections. 
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Figure 6.6 Early reflections. 



Craven Hypothesis. The Craven hypothesis [Ger92] states that the human perception of 
the distance to a sound source is evaluated with the help of the amplitude and delay time 
ratios of the direct signal and early reflections as given by 




7b = 



d f -d 



c 



=>• d = 



cT u 

IT 1 -!’ 



( 6 . 11 ) 

( 6 . 12 ) 

(6.13) 



where d is the distance of the source, d' the distance of the image source of the first 
reflection, g the relative amplitude of the direct signal to first reflection, c the velocity 
of sound, and Td the relative delay time of the first reflection to the direct signal. 

Without a reflection, human beings are not able to determine the distance d to a sound 
source. The extended Craven hypothesis includes the absorption coefficient r for 
determining 



8 



To 



d 

— exp(-/-7b), 
d r 

d' — d 
c 

cTd 

g _1 exp(-/-7b) - 1 
exp(— rTp) 

1 + cTo/d 



(6.14) 

(6.15) 

(6.16) 
(6.17) 



For a given reverberation time T^o, the absorption coefficient can be calculated by using 
exp(— rT(,o) = 1/1000. that is, 

r = (In 1000)/ 7 m). (6.18) 



With the relationships (6.15) and (6.17), the parameters for an early reflections simulator 
as shown in Fig. 6.5 can be determined. 
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Gerzon’s Distance Algorithm. For a system simulating early reflections produced by more 
than one sound source, Gerzon’s distance algorithm can be used [Ger92], where several 
sound sources are placed at different distances as well as in the stereo position into a 
stereophonic sound field. An application of this technique is mainly used in multichannel 
mixing consoles. 

By shifting a sound source by —S (decrease of relative delay time) it follows that from 
the relative delay time of the first reflection Tp — 8/c — [ d 1 — (d + <5)]/c, and the relative 
amplitude according to (6.17), 



c(T D -8/c) 



exp(— r(7o - S/c)) = 



■ exp (rS/c) 



exp (~rT D ) 
1 + cT £> / d 



This results in a delay and a gain factor for the direct signal (see Fig. 6.7) as given by 



d2 — d -f- <5, 


(6.20) 


tD — S/c , 


(6.21) 


d 

8D = , , . exp( rS/c). 
a + o 


(6.22) 




Figure 6.7 Delay and weighting of the direct signal. 

By shifting a sound source by +5 (increase in relative delay time) the relative delay 
time of the first reflection is To — S/c = [d' — (d — 5)]/c. As a consequence, the delay 
and the gain factor for the effect signal (see Fig. 6.8) are given by 



dj — d — S, 


(6.23) 


t E — S/c , 


(6.24) 


d 




8E= , , „ exp( rS/c). 
a + d 


(6.25) 



Using two delay systems in the direct signal as well as in the reflection path, two cou- 
pled weighting factors and delay lengths (see Fig. 6.9) can be obtained. For multichannel 
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Figure 6.8 Delay and weighting of effect signal. 



applications like digital mixing consoles, the scheme in Fig. 6.10 is suggested by Gerzon 
[Ger92], Only one system for implementing early reflections is necessary. 




Figure 6.9 Coupled factors and delays. 



Stereo Implementation. In many applications, stereo signals have to be processed (see 
Fig. 6.11). For this purpose, reflections from both sides with positive and negative angles 
are implemented to avoid stereo displacements. The weighting is done with 



exp(— rTj) 

1 + cT I / d 

/ cos ©, —sin ©,\ 
^sin ©,• cos ©, J ' 



(6.26) 



For each reflection, a weighting factor and an angle have to be considered. 

Generation of Early Reflection with Increasing Time Density. In [Schr61] it is stated 
that the time density of reflections increases with the square of time: 

Number of reflections per second = (4ttc 3 / V) • t 2 . (6.27) 



After time tc the reflections have a statistical decay behavior. For a pulse width of A f, 
individual reflections overlap after 



(6.28) 
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Figure 6.11 Stereo reflections. 



To avoid overlap of reflections, Gerzon [Ger92] suggests increasing the density of reflec- 
tions with t p (for example, p — 1, 0.5 leads to t or t 05 ). In the interval (0, 1], with initial 
value Vo and a number k between 0.5 and 1 the following procedure is performed: 

y’i = A'o + ik (mod 1), i = 0, 1, . . . , M — 1. (6.29) 

The numbers y; in the interval (0, 1] are now transformed to time delays 7} in the interval 
[Train, Tmi n + Tmax] by 

b = T l+P , 

min ’ 

a = (Traax + T min ) l+P - b, 

T i ={ay i +b) l / {l+ P\ 

The increase in the density of reflections is shown by the example in Fig. 6. 12. 



(6.30) 

(6.31) 

(6.32) 
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Figure 6.12 Increase in density for nine reflections. 



6.3 Subsequent Reverberation 



This section deals with techniques for reproducing subsequent reverberation. The first 
approaches by Schroeder [Schr61, Schr62] and their extension by Moorer [Moo78] will 
be described. Further developments by Stautner and Puckette [Sta82], Smith [Smi85], 
Dattarro [Dat97], and Gardner [Gar98] led to general feedback networks [Ger71, Ger76, 
Jot91, Jot92, Roc95, Roc96, Roc97, Roc02] which have a random impulse response with 
exponential decay. An extensive discussion on analysis and synthesis parameters of sub- 
sequent reverberation can be found in [BleOl]. Apart from the echo density, an important 
parameter of subsequent reverberation [Cre03 ] is the quadratic increase in 



Frequency density = 



4jt V 



■f 2 



(6.33) 



with frequency. The following systems exhibit a quadratic increase in echo density and 
frequency density. 



6.3.1 Schroeder Algorithm 

The first software implementations of room simulation algorithms were carried out in 1961 
by Schroeder. The basis for simulating an impulse response with exponential decay is a 
recursive comb filter as shown in Fig. 6. 13. 

The transfer function is given by 



H(z) = 



-M 



1 -gz 



-M ’ 



M— I 



- E 



Ak 



(6.34) 



Z ~ Zk 



(6.35) 
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Figure 6.13 Recursive comb filter (g = feedback factor, M = delay length). 



with 



■J* 1 

N? | , 

II 


residues, 


(6.36) 


Mg 


Zk = reJ 2 ^ M 


poles, 


(6.37) 


r = g l ' M 


pole radius. 


(6.38) 



With the correspondence of the Z-transform a/(z — a) o— • e(n — 1 )a" the impulse re- 
sponse is given by 



H{z) o—9 h(n) 
h(n ) 



e(u - 1 ) 
Mg 



M - 1 



£4 



e(n - 1) 
Mg 



M - 1 



£ 



e j^ kn 



(6.39) 



The complex poles are combined as pairs so that the impulse response can be written as 

M/2-1 



h(n ) = — ~ r " £ cos 



Mg 



f(« - 1 ) r n 



k= 1 



Mg 



M+l/2-1 

1 + COS £2/fcn 

k=l 



M even, 



M uneven. 



(6.40) 

(6.41) 



The impulse response is expressed as a summation of cosine oscillations with frequencies 
Qk- These frequencies correspond to the eigenfrequencies of a room. They decay with an 
exponential envelope r n , where r is the damping constant (see Fig. 6.15a). The overall 
impulse response is weighted by 1 /Mg. The frequency response of the comb filter is shown 
in Fig. 6.15c and is given by 

\H(e jSi )\ = J 1 y. (6.42) 

Y 1 - 2g cos(^M) + g 2 

It shows maxima at f2 = 2jtk/M{k = 0, 1, . . . , M — 1) of magnitude 

\H(e’^)\ 

max — T ? (6.43) 

1 ~8 

and minima at £2 = (2 k + \)n / M(k = 0, 1, . . . , M — 1) of magnitude 

j— . (6.44) 

1 + g 
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Another basis of the Schroeder algorithm is the all-pass filter shown in Fig. 6.14 with 
transfer function 



H{z) = 



1 - gz M 



1 - gz M 



8 

1 — gz~ M 



(6.45) 

(6.46) 



From (6.46) it can be seen that the impulse response can also be expressed as a summation 
of cosine oscillations. 



9 




The impulse responses and frequency responses of a comb filter and an all-pass filter are 
presented in Fig. 6. 15. Both impulse responses show an exponential decay. A sample in the 
impulse response occurs every M sampling periods. The density of samples in the impulse 
responses does not increase with time. For the recursive comb filter, spectral shaping due 
to the maxima at the corresponding poles of the transfer function is observed. 



Frequency Density 



The frequency density describes the number of eigenfrequencies per hertz and is defined 
for a comb filter [Jot91] as 

D f = MT s [1/Hz]. (6.47) 

A single comb filter gives M resonances in the interval [0, 27 t], which are separated by a 
frequency distance of A f = fs/M. In order to increase the frequency density, a parallel 
circuit (see Fig. 6.16) of P comb filters is used which leads to 






p = t 



z~ Mp 

1 - gpZ~ Mp 



Z -Mi z -M 2 

_ 1 — glZ~ Ml 1 — giz~ Ml 



(6.48) 



The choice of the delay systems [Schr62] is suggested as 



Mi : M P = 1 : 1.5 



(6.49) 



and leads to a frequency density 

p 

Df = M p • T s = p ■ M ■ T s- 
p = t 



(6.50) 
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Figure 6.15 (a) Impulse response of a comb filter ( M = 10, g = —0.6). (b) Impulse response of an 
all-pass filter ( M = 10, g = —0.6). (c) Frequency response of a comb filter, (d) Frequency response 
of an all-pass filter. 



In [Schr62] a necessary frequency density of D f — 0.15 eigenfrequencies per hertz is 
proposed. 



Echo Density 

The echo density is the number of reflections per second and is defined for a comb filter 
[Jot91] as 

D ,= — - — [1/s]. (6.51) 

M-T s 

For a parallel circuit of comb filters, the echo density is given by 

A 1 1 

D,= y = p = — . 

Mp'Ts M-T s 



(6.52) 
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Figure 6.16 Parallel circuit of comb filters. 



With (6.50) and (6.52), the number of parallel comb filters and the mean delay length are 
given by 



P = y/Df-Du (6.53) 

MT S = yfDjJW,. (6.54) 

For a frequency density D f = 0.15 and an echo density D t = 1000 it can be concluded that 
the number of parallel comb filters is P — 12 and the mean delay length is MTs — I 2 ms. 
Since the frequency density is proportional to the reverberation time, the number of parallel 
comb filters has to be increased accordingly. 

A further increase in the echo density is achieved by a cascade circuit of Pa all-pass 
filters (see Fig. 6.17) with transfer function 



Hiz ) = n — 

p = i 



Pa z Mp - gp 



gpZ Mp ' 



(6.55) 



These all-pass sections are connected in series with the parallel circuit of comb filters. For 
a sufficient echo density, 10000 reflections per second are necessary [Gri89]. 



Avoiding Unnatural Resonances 

Since the impulse response of a single comb filter can be described as a sum of M (delay 
length) decaying sinusoidal oscillations, the short-time FFT of consecutive parts from this 
impulse response gives the frequency response shown in Fig. 6.18 in the time-frequency 
domain. Only the maxima are presented. The parallel circuit of comb filters with the con- 
dition (6.49) leads to radii of pole distribution as given by r p = gp Mp (p — 1,2,..., P). 
In order to avoid unnatural resonances, the radii of the pole distribution of a parallel circuit 
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Figure 6.17 Cascade circuit of all-pass filters. 



of comb filters must satisfy the condition 

r p — const. = g i J Mp , for p = 1, 2, . . . , P. (6.56) 

This leads to the short-time spectra and the pole distribution as shown in Fig. 6.19. 
Figure 6.20 shows the impulse response and the echogram (logarithmic presentation of 
the amplitude of the impulse response) of a parallel circuit of comb filters with equal and 
unequal pole radii. For unequal pole radius, the different decay times of the eigenfrequen- 
cies can be seen. 




Figure 6.18 Short-time spectra of a comb filter (M = 8). 



Reverberation Time 



The reverberation time of a recursive comb filter can be adjusted with the feedback factor 



g which describes the ratio 



/;(«) 

h(n — M) 



(6.57) 



of two different nonzero samples of the impulse response separated by M sampling periods. 
The factor g describes the decay constant per M samples. The decay constant per sampling 
period can be calculated from the pole radius r = g^ M and is defined as 



h(n ) 
h (n — 1) 



(6.58) 







h(n) in dB -> h(n) _> 
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Figure 6.19 Short-time spectra of a parallel circuit of comb filters. 






Figure 6.20 Impulse response and echogram. 



The relationship between feedback factor g and pole radius r can also be expressed using 
(6.57) and (6.58) and is given by 

h(n) h(n ) h(n — 1) h(n — (M — 1)) M 

^ h(n — M) h(n — 1) h (n — 2) h(n — M) 



(6.59) 
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With the constant radius r = g\! Mp and the logarithmic parameters R = 20 log 10 r and 
G p — 20 log j q g p , the attenuation per sampling period is given by 



Gn 

r = tt- (6-60) 

M p 

The reverberation time is defined as decay time of the impulse response to — 60 dB. With 
—60/ Tiso = R/ T $ , the reverberation time can be written as 



T S T S M P 

T 60 = -60— = —60— — — 

R Gn 



iogio \ l/gp 



■M p ■ T S . 



(6.61) 



The control of reverberation time can either be carried out with the feedback factor g or 
the delay parameter M. The increase in the reverberation time with factor g is responsible 
for a pole radius close to the unit circle and, hence, leads to an amplification of maxima of 
the frequency response (see (6.43)). This leads to a coloring of the sound impression. The 
increase in the delay parameter M, on the other hand, leads to an impulse response whose 
nonzero samples are far apart from each other, so that individual echoes can be heard. The 
discrepancy between echo density and frequency density for a given reverberation time can 
be solved by a sufficient number of parallel comb filters. 



Frequency-dependent Reverberation Time 

The eigenfrequencies of rooms have a rapid decay for high frequencies. A frequency- 
dependent reverberation time can be implemented with a low-pass filter 

rti(z)= 1 , (6.62) 

1 — az 1 



in the feedback loop of a comb filter. The modified comb filter in Fig. 6.21 has transfer 
function 



H(z) = 



1 -g//i(z)z- M 



(6.63) 



with the stability criterion 



g 

1 — a 



< 1 . 



(6.64) 




Figure 6.21 Modified low-pass comb filter. 
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The short-time spectra and the pole distribution of a parallel circuit with low-pass comb 
filters are presented in Fig. 6.22. Low eigenfrequencies decay more slowly than higher ones. 
The circular pole distribution becomes an elliptical distribution where the low-frequency 
poles are moved toward the unit circle. 





Figure 6.22 Short-time spectra of a parallel circuit of low-pass comb filters. 



Stereo Room Simulation 

An extension of the Schroeder algorithm was suggested by Moorer [Moo78]. In addition to 
a parallel circuit of comb filters in series with a cascade of all-pass filters, a pattern of early 
reflections is generated. Figure 6.23 shows a room simulation system for a stereo signal. 
The generated room signals <?/Jn) and e r(ii) are added to the direct signals X[Jn) and 
xr{h). The input of the room simulation is the mono signal x m(ii) = xl(u) + xr(h) (sum 
signal). This mono signal is added to the left and right room signals after going through 
a delay line DELI. The total sum of all reflections is fed via another delay line DEL2 to 
a parallel circuit of comb filters which implements subsequent reverberation. In order to 
get a high-quality spatial impression, it is necessary to decorrelate the room signals e/, in) 
and eR(n) [Bla74, Bla85]. This can be achieved by taking left and right room signals at 
different points out of the parallel circuit of comb filters. These room signals are then fed 
to an all-pass section to increase the echo density. 

Besides the described system for stereo room simulation in which the mono signal is 
processed with a room algorithm, it is also possible to perform complete stereo process- 
ing of xl(h) und xr(ti), or to process a mono signal x 'mM — xl(h ) + xr{h) and a side 
(difference) signal xs(n) — xl(h ) — xr(h) individually. 



6.3.2 General Feedback Systems 

Further developments of the comb filter method by Schroeder tried to improve the acoustic 
quality of reverberation and especially the increase in echo density [Ger71, Ger76, Sta82, 
Jot91, Jot92, Roc95, Roc96, Roc97a, RS97b|. With respect to [Jot91], the general feedback 
system in Fig. 6.24 is considered. For the sake of simplicity only three delay systems are 
shown. The feedback of output signals is carried out with the help of a matrix A which 
feeds back each of the three outputs to the three inputs. 
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Figure 6.24 General feedback system. 
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In general, for N delay systems we can write 

N 

y(n) = E Ciqi(n) + dx(n ), (6.65) 

i=l 

N 

qj(n + nij)= ajjqi(n) + bjx(n), 1 < j < N. (6.66) 

;=i 

The Z-transform leads to 



Y(z) = c T Q(z) + d- X(z), 


(6.67) 


D(z) • Q(z) = A • Q(z) + b • X(z) 

-► Q(z) = [D(z) - A] -1 b • X(z), 


(6.68) 



with 







b i 
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, b = 




, c = 




_Qn(z)_ 




_b N _ 




_CN_ 



(6.69) 



and the diagonal delay matrix 



D(z) = diag[z _mi • • • z~ mN l (6.70) 

With (6.68) the Z-transform of the output is given by 

F(z) = c r [D(z) - A]“‘b • X(z) + d • X(z) (6.71) 

and the transfer function by 

H(z) = c r [D(z) - A] -1 b + d. (6.72) 

The system is stable if the feedback matrix A can be expressed as a product of a 
unitary matrix U (U -1 = U ) and a diagonal matrix with ga < 1 (derivation in [Sta82]). 
Figure 6.25 shows a general feedback system with input vector X(z), output vector Y(z), 
a diagonal matrix D(z) consisting of purely delay systems z ~ m ' , and a feedback matrix A. 
This feedback matrix consists of an orthogonal matrix U multiplied by the matrix G which 
results in a weighting of the feedback matrix A. 

If an orthogonal matrix U is chosen and the weighting matrix is equal to the unit matrix 
G = I, the system in Fig. 6.25 implements a white-noise random signal with Gaussian 
distribution when a pulse excitation is applied to the input. The time density of this signal 
slowly increases with time. If the diagonal elements of the weighting matrix G are less 
than one, a random signal with exponential amplitude decay results. With the help of 
the weighting matrix G, the reverberation time can be adjusted. Such a feedback system 
performs the convolution of an audio input signal with an impulse response of exponential 
decay. 

The effect of the orthogonal matrix U on the subjective sound perception of subsequent 
reverberation is of particular interest. A relationship between the distribution of the eigen- 
values of the matrix U on the unit circle and the poles of the system transfer function cannot 
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Y(z) 



be described analytically, owing to the high order of the feedback system. In [Her94], it is 
shown experimentally that the distribution of eigenvalues within the right-hand or left-hand 
complex plane produces a uniform distribution of poles of the system transfer function. 
Such a feedback matrix leads to an acoustically improved reverberation. The echo density 
rapidly increases to the maximum value of one sample per sampling period for a uniform 
distribution of eigenvalues. Besides the feedback matrix, additional digital filtering is nec- 
essary for spectrally shaping the subsequent reverberation and for implementing frequency- 
dependent decay times (see [Jot91]). The following example illustrates the increase of the 
echo density. 



Example: First, a system with only one feedback path per comb filter is considered. The 
feedback matrix is then given by 



A = 




(6.73) 



Figure 6.26 shows the impulse response and the amplitude frequency response. 
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Figure 6.26 Impulse response and frequency response of a 4-delay system with a unit matrix as 
unitary feedback matrix (g = 0.83). 
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With the feedback matrix 



0 110 
- 1 0 o - 1 

V2 10 0-1 

0 1-1 0 



(6.74) 



from [Sta82], the impulse response and the corresponding frequency response shown in 
Fig. 6.27 are obtained. In contrast to Fig. 6.26, an increase in the echo density of the impulse 
response is observed. 



0.5 



T 



- 0.5 




Figure 6.27 Impulse response and frequency response of a 4-delay system with unitary feedback 
matrix (g = 0.63). 



6.3.3 Feedback All-pass Systems 

In addition to the general feedback systems, simple delay systems with feedback have been 
used for room simulators (see Fig. 6.28). Theses simulators are based on a delay line, 
where single delays are fed back with L feedback coefficients to the input. The sum of 
the input signal and feedback signal is low-pass filtered or spectrally weighted by a low- 
frequency shelving filter and is then put to the delay line again. The first N reflections are 
extracted out of the delay line according to the reflection pattern of the simulated room. 
They are weighted and added to the output signal. The mixing between the direct signal 
and the room signal is adjusted by the factor g MIX . The inner system can be described by a 
rational transfer function H(z) — Y (z ) / X (z ) . In order to avoid a low frequency density the 
feedback delay lengths can be made time-variant [Gri89, Gri9 1 ] . 

Increasing the echo density can be achieved by replacing the delays z~ Mi by frequency 
dependent all-pass systems A{z~ M ')- This extension was first proposed by Gardner in 
[Gar92, Gar98]. In addition to the replacement of z~ M ‘ — >■ A(z~ Mi ), the all-pass systems 
can be extended by embedded all -pass systems [Gar92]. Figure 6.29 shows an all-pass 
system (Fig. 6.29a) where the delay z~ M is replaced by a further all-pass and a unit delay 
z~ l (Fig. 6.29b). The integration of a unit delay avoids delay-free loops. In Fig. 6.29c the 
inner all-pass is replaced by a cascade of two all-pass systems and a further delay z~ M} - 
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Figure 6.28 Room simulation with delay line and forward and backward coefficients. 



The resulting system is again an all-pass system [Gar92, Gar98]. A further modification of 
the general all-pass system is shown in Fig. 6.29d [Dat97, Vaa97, DahOO]. Here, a delay 
Z~ M followed by a low-pass and a weighting coefficient is used. The resulting system 
is called an absorbent all-pass system. With these embedded all-pass systems the room 
simulator shown in Fig. 6.28 is extended to a feedback all-pass system which is shown in 
Fig. 6.30 [Gar92,Gar98]. The feedback is performed by a low-pass filter and a feedback 
coefficient g, which adjusts the decay behavior. The extension to a stereo room simulator 
is described in [Dat97, DahOO] and is depicted in Fig. 6.31 [DahOO]. The cascaded all-pass 
systems A,- (z) in the left and right channel can be a combination of embedded and absorbent 
all-pass systems. Both output signals of the all-pass chains are fed back to the input and 
added. In front of both all-pass chains a coupling of both channels with a weighted sum 
and difference is performed. The setup and parameters of such a system are discussed in 
[DahOO]. A precise adjustment of reverberation time and control of echo density can be 
achieved by the feedback coefficients of the all-passes. The frequency density is controlled 
by the scaling of the delay lengths of the inner all-pass systems. 



6.4 Approximation of Room Impulse Responses 

In contrast to the systems for simulation of room impulse responses discussed up to this 
point, a method is now presented that measures and approximates the room impulse re- 
sponse in one step [Zol90b, Sch92, Sch93] (see Fig. 6.32). Moreover, it leads to a paramet- 
ric representation of the room impulse response. Since the decay times of room impulse 
responses decrease for high frequencies, use is made of multirate signal processing. 

The analog system that is to be measured and approximated is excited with a binary 
pseudo-random sequence x (n) via a DA converter. The resulting room signal gives a digital 
sequence y(n) after AD conversion. The discrete-time sequence y(n) and the pseudo- 
random sequence x ( n ) are each decomposed by an analysis filter bank into subband signals 
yi, . . . , yp and x\, . . . , xp respectively. The sampling rate is reduced in accordance with 
the bandwidth of the signals. The subband signals y \ , . . . , yp are approximated by adjust- 
ing the subband systems H\(z) = A\(z)/B\{z), . . . , Hp(z) — Ap(z)/Bp(z). The outputs 
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Figure 6.29 Embedded and absorbing all-pass system [Gar92, Gar98, Dat97, Vaa97, DahOO]. 



i-9mix 




Figure 6.30 Room simulator with embedded all-pass systems [Gar92,Gar98]. 



yu ■ ■ ■ , yp °f these subband systems give an approximation of the measured subband 
signals. With this procedure, the impulse response is given in parametric form (subband 
parameters) and can be directly simulated in the digital domain. 













6.4 Approximation of Room Impulse Responses 



Figure 6.31 Stereo room simulator with absorbent all-pass systems [DahOO] 
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Figure 6.32 System measuring and approximating room impulse responses. 
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By suitably adjusting the analysis filter bank [Sch94], the subband impulse responses 
are obtained directly from the cross-correlation function 



hj 



r x m - 



(6.75) 



The subband impulse responses are approximated by a nonrecursive filter and a recursive 
comb filter. The cascade of both filters leads to the transfer function 



Hi{z) = 



ftpH \-b Mi Z Mi 

1 - giZ~ Ni 



OO 



'Y^hiinOz 

«;= 0 



(6.76) 



which is set equal to the impulse response in subband i. Multiplying both sides of (6.76) 
by the denominator 1 — g,z~ Ni gives 



bo H f b M ,Z Mi 




(1 - giZ 



-Ni 



)■ 



(6.77) 



Truncating the impulse response of each subband to K samples and comparing the co- 
efficients of powers of z on both sides of the equation, the following set of equations is 
obtained: 



^0 




ho 


0 


0 


0 


b\ 




hi 


ho 


0 


0 


bM 


— 


hM 


hM - 1 


h M—2 ■ ■ 


IlM-N 


0 




hM+l 


hM 


hM - 1 • • 


h M—N+l 


_ 0 _ 




h k 


hic-i 


h K—2 ' ' 


hK-N 



1 

0 

~g. 



(6.78) 



The coefficients bo, , bM and g in the above equation are determined in two steps. 
First, the coefficient g of the comb filter is calculated from the exponentially decaying 
envelope of the measured subband impulse response. The vector [1, 0, . . . , g] T is then 
used to determine the coefficients [Z?o, b\, ... , bnt] T ■ 

For the calculation of the coefficient g, we start with the impulse response of the comb 
filter H(z)— 1/(1 — gz ~ N ) given by 



h (Z = Nn ) = g 1 . 



(6.79) 



We further make use of the integrated impulse response 

OO 

h e (k) = Y h ( n ) 2 (6-80) 

n=k 



defined in [Schr65], This describes the rest energy of the impulse response at time k. By 
taking the logarithm of h e (k), a straight line over time index k is obtained. From the slope 
of the straight line we use 

In h e (n \ )- In h e {m) 

In g = N ■ 



n i — n 2 



n i < n 2, 



(6.81) 
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to determine the coefficient g [Sch94]. For M — N, the coefficients in (6.78) of the numer- 
ator polynomial are obtained directly from the impulse response 



b n — h n , n — 0, 1 , . . . , M — 1 , 

t>M = h M - gh o- (6.82) 



Hence, the numerator polynomial of (6.76) is a direct reproduction of the first M samples 
of the impulse response (see Fig. 6.33). The denominator polynomial approximates the 
further exponentially decaying impulse response. This method is applied to each subband. 
The implementation complexity can be reduced by a factor of 10 compared with the direct 
implementation of the broad-band impulse response [Sch94], However, owing to the group 
delay caused by the filter bank, this method is not so suitable for real-time applications. 



h(n) 
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Figure 6.33 Determining model parameters from the measured impulse response. 



6.5 Java Applet - Fast Convolution 

The applet shown in Fig. 6.34 demonstrates audio effects resulting from a fast convolution 
algorithm. It is designed for a first insight into the perceptual effects of convolving an 
impulse response with an audio signal. 

The applet generates an impulse response by modulating the amplitude of a random 
signal. The graphical interface presents the curve of the amplitude modulation, which can 
be manipulated with three control points. Two control points are used for the initial behavior 
of the amplitude modulation. The third control point is used for the exponential decay of 
the impulse response. You can choose between two predefined audio files from our web 
server ( audiol.wav or audio2.wav ) or your own local wav file to be processed [Gui05]. 



6.6 Exercises 

1. Room Impulse Responses 

1 . How can we measure a room impulse response? 

2. What kind of test signal is necessary? 

3. How does the length of the impulse response affect the length of the test signal? 
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Figure 6.34 Java applet - fast convolution. 



2. First Reflections 

For a given sound (voice sound) calculate the delay time of a single first reflection. Write a 

Matlab program for the following computations. 

1. How do we choose this delay time? What coefficient should be used for it? 

2. Write an algorithm which performs the convolution of the input mono signal with 
two impulse responses which simulate a reflection to the left output y'L(n) and a 
second reflection to the right output y«(n). Check the results by listening to the 
output sound. 

3. Improve your algorithm to simulate two reflections which can be positioned at any 
angle inside the stereo mix. 

3. Comb and All-pass Filters 

1. Comb Filters: Based on the Schroeder algorithm, draw a signal flow graph for a 
comb filter consisting of a single delay line of M samples with a feedback loop 
containing an attenuation factor g. 

(a) Derive the transfer function of the comb filter. 

(b) Now the attenuation factor g is in the feed-forward path and in the feedback 
loop no attenuation is applied. Why can we consider the impulse response of 
this model to be similar to the previous one? 

(c) In both cases how should we choose the gain factor? What will happen if we 
do otherwise? 

(d) Calculate the reverberation time of the comb filter for fs — 44.1 kHz, M — 8 
and g specified previously. 

(e) Write down what you know about the filter coefficients, plot the pole/zero 
locations and the frequency response of the filter 
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2. All-pass Filters: Realize an all-pass structure as suggested by Schroeder. 

(a) Why can we expect a better result with an all-pass filter than with comb filter? 
Write a Matlab function for a comb and all-pass filter with M — 8, 16. 

(b) Derive the transfer function and show the pole/zero locations, the impulse re- 
sponse, the magnitude and phase responses. 

(c) Perform the filtering of an audio signal with the two filters and estimate the 
delay length M which leads to a perception of a room impression. 

4. Feedback Delay Networks 

Write a Matlab program which realizes a feedback delay network. 

1 . What is the reason for a unitary feedback matrix? 

2. What is the advantage of using a unitary circulant feedback matrix? 

3. How do you control the reverberation time? 
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Chapter 7 

Dynamic Range Control 



The dynamic range of a signal is defined as the logarithmic ratio of maximum to minimum 
signal amplitude and is given in decibels. The dynamic range of an audio signal lies 
between 40 and 120 dB. The combination of level measurement and adaptive signal level 
adjustment is called dynamic range control. Dynamic range control of audio signals is 
used in many applications to match the dynamic behavior of the audio signal to different 
requirements. While recording, dynamic range control protects the AD converter from 
overload or is employed in the signal path to optimally use the full amplitude range of 
a recording system. For suppressing low-level noise, so-called noise gates are used so that 
the audio signal is passed through only from a certain level onwards. While reproducing 
music and speech in a car, shopping center, restaurant or disco the dynamics have to match 
the special noise characteristics of the environment. Therefore the signal level is measured 
from the audio signal and a control signal is derived which then changes the signal level to 
control the loudness of the audio signal. This loudness control is adaptive to the input level. 



7.1 Basics 

Figure 7.1 shows a block diagram of a system for dynamic range control. After measuring 
the input level X d n(n), the output level Ycm(n) is affected by multiplying the delayed input 
signal x(n) by a factor g(n) according to 

y(n) = g{n) ■ x(n - D). (7.1) 

The delay of the signal x (n) compared with the control signal g (n ) allows predictive control 
of the output signal level. This multiplicative weighting is carried out with corresponding 
attack and release time. Multiplication leads, in terms of a logarithmic level representation 
of the corresponding signals, to the addition of the weighting level Gdh(n ) to the input 
level XdB («), giving the output level 



TdB(n) = X dB (n) + G<ib(«). 



(7.2) 
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Figure 7.1 System for dynamic range control. 



7.2 Static Curve 



The relationship between input level and weighting level is defined by a static level curve 
CdB («) = /(VdB («)). An example of such a static curve is given in Fig. 7.2. Here, the 
output level and the weighting level are given as functions of the input level. 




Figure 7.2 Static curve with the parameters (LT = limiter threshold, CT = compressor threshold, 
ET = expander threshold and NT = noise gate threshold). 

With the help of a limiter, the output level is limited when the input level exceeds 
the limiter threshold LT. All input levels above this threshold lead to a constant output 
level. The compressor maps a change of input level to a certain smaller change of output 
level. In contrast to a limiter, the compressor increases the loudness of the audio signal. 
The expander increases changes in the input level to larger changes in the output level. 
With this, an increase in the dynamics for low levels is achieved. The noise gate is used 
to suppress low-level signals, for noise reduction and also for sound effects like truncating 
the decay of room reverberation. Every threshold used in particular parts of the static curve 
is defined as the lower limit for limiter and compressor and upper limit for expander and 
noise gate. 

In the logarithmic representation of the static curve the compression factor R (ratio) is 
defined as the ratio of the input level change A Pj to the output level change A Pq : 

A Pi 

R= -. 

A P 0 



(7.3) 
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With the help of Fig. 7.3 the straight line equation Y^b («) = CT + R 1 ( X c ib («) — CT) and 
the compression factor 



R = 



X dB («) ~ CT 
FdB(n) - CT 



= tan j6 c 



(7.4) 



are obtained, where the angle /) is defined as shown in Fig. 7.2. The relationship between 
the ratio R and the slope S can also be derived from Fig. 7.3 and is expressed as 



5 = 



1 

j R ’ 



R = 



1 

1 - 5 ' 



(7.5) 

(7.6) 




Figure 7.3 Compressor curve (compressor ratio CR/slope CS). 
Typical compression factors are 

R = oo, limiter, 

R > 1, compressor (CR: compressor ratio), 

0 <R< 1, expander (ER: expander ratio). 

R = 0, noise gate. 

The transition from logarithmic to linear representation leads, from (7.4), to 



(7.7) 



R = 



l°olO 



x(n) 



log 10 



y(«) 



(7.8) 



where x(n) and y(n) are the linear levels and cj denotes the linear compressor threshold. 
Rewriting (7.8) gives the linear output level 



M = 10 l/i?log 10 (.i(«)/cr) = ( 

C T \ C T ) 

y(n) = Cj 1/R -x 1 ' R (n) 



,1/R 



(7.9) 



as a function of input level. The control factor g(n ) can be calculated by the quotient 

y(n) / fnn \*/^ 1 



g 00 = 



x (n ) 



(f) 



(7.10) 
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With the help of tables and interpolation methods, it is possible to determine the control 
factor without taking logarithms and antilogarithms. The implementation described as fol- 
lows, however, makes use of the logarithm of the input level and calculates the control 
level GdB(«) with the help of the straight line equation. The antilogarithm leads to the 
value f(n) which gives the control factor g(n) with corresponding attack and release time 
(see Fig. 7.1). 



7.3 Dynamic Behavior 

Besides the static curve of dynamic range control, the dynamic behavior in terms of attack 
and release times plays a significant role in sound quality. The rapidity of dynamic range 
control depends also on the measurement of PEAK and RMS values [McN84, Sti86] . 



7.3.1 Level Measurement 

Level measurements [McN84] can be made with the systems shown in Figs 7.4 and 7.5. 
For PEAK measurement, the absolute value of the input is compared with the peak value 
apeak («)• If the absolute value is greater than the peak value, the difference is weighted 
with the coefficient AT (attack time) and added to (1 — AT) ■ apeak (« — I). For this attack 
case \x{ri)\ > apeak(« — 1) we get the difference equation (see Fig. 7.4) 

apeak («) = (1 - AT) ■ a PE ak (n - 1) + AT ■ \x(n)\ (7.1 1) 



with the transfer function 



H{z) = 



AT 



1 



(1 -AT)z~ r 

If the absolute value of the input is smaller than the peak value \x{n)\ < apeak (« 
release case), the new peak value is given by 



(7.12) 
1) (the 



apeak(») = (1 - RT) ■ apeak (n - 1) 



(7.13) 



with the release time coefficient RT. The difference signal of the input will be muted by 
the nonlinearity such that the difference equation for the peak value is given according to 
(7.13). For the release case the transfer function 



H{z) = 



1 

1 - (1 -RT)z~ l 



(7.14) 



is valid. For the attack case the transfer function (7.12) with coefficient AT is used and for 
the release case the transfer function (7.14) with the coefficient RT. The coefficients (see 
Section 7.3.3) are given by 



AT = 



— exp 



— 2.27s \ 
G/1000/’ 



RT — 




(7.15) 



t r / 1000 



(7.16) 




7.3 Dynamic Behavior 



229 



where the attack time t a and the release time t r are given in milliseconds (7$ sampling 
interval). With this switching between filter structures one achieves fast attack responses 
for increasing input signal amplitudes and slow decay responses for decreasing input signal 
amplitudes. 




Figure 7.4 PEAK measurement. 



TAV 




Figure 7.5 RMS measurement (TAV = averaging coefficient). 
The computation of the RMS value 



arms (w ) 



I 




JV — t 



Y x 2 (n — i ) 
1=0 



(7.17) 



over N input samples can be achieved by a recursive formulation. The RMS measurement 
shown in Fig. 7.5 uses the square of the input and performs averaging with a first-order 
low-pass filter. The averaging coefficient 



TAV = 1 - 



/ —2.27a \ 

W/ioooj 



(7.18) 



is determined according to the time constant calculation discussed in Section 7.3.3, where 
tM is the averaging time in milliseconds. The difference equation is given by 

4ms(«) = (1 - TAV) ■ 4 ms (« - 1) + TAV • x 2 (n) (7.19) 

with the transfer function 

TAV 

HU) ~ 1 - (1 -TAV)z-‘ ' 



(7.20) 
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7.3.2 Gain Factor Smoothing 

Attack and release times can be implemented by the system shown in Fig. 7.6 [McN84], 
The attack coefficient AT or release coefficient RT is obtained by comparing the input 
control factor and the previous one. A small hysteresis curve determines whether the control 
factor is in the attack or release state and hence gives the coefficient AT or RT. The system 
also serves to smooth the control signal. The difference equation is given by 



g{n) = (1 - k) ■ gin - 1) + k ■ fin), 



with k — AT or k — RT, and the corresponding transfer function leads to 



Hiz) = 



k 

1 - (1 -k)z~ r 



(7.21) 



(7.22) 




Figure 7.6 Implementing attack and release time or gain factor smoothing. 



7.3.3 Time Constants 

If the step response of a continuous-time system is 

git) = 1 — e~'^ T , x = time constant, (7.23) 



then sampling (step-invariant transform) the step response gives the discrete-time step 
response 

ginTs) = sinTs) - e~ nTs ' T = l - z 1 ^, Zoo=e~ Ts/r . (7.24) 

The Z-transform leads to 



Giz) = 



1 



1 Zoo 



Z-l 1-ZooZ 1 (z 1)(1 ZooZ V 
With the definition of attack time t a — tgo — tio, we derive 



(7.25) 



0.1 = 1 - e -' lo/z 
0.9 = 1 - e~‘ 9o/T 



/ 10 = O.lr, 
f 9 o = 0.9r. 



(7.26) 

(7.27) 
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The relationship between attack time t a and the time constant r of the step response is 
obtained as follows: 

0.9/0. 1 = e it90 ~ ,w),r 
ln(0. 9/0.1) = (f 9 o — t\o)/x 

ta = t90 - tio = 2.2r. (7.28) 



Hence, the pole is calculated as 



Zoo = e i:nslta .\ (7.29) 

A system for implementing the given step response is obtained by the relationship between 
the Z-transform of the impulse response and the Z-transform of the step response: 

H(z) = — -G(z). (7.30) 

z 

The transfer function can now be written as 

HW= (l-z^k-‘ (7.31) 

1 - ZooZ 1 

with the pole Zoo — e~ 2 ' 2Ts ' ,a adjusting the attack, release or averaging time. For the co- 
efficients of the corresponding time constant filters the attack case is given by (7.15), the 
release case by (7.16) and the averaging case by (7.18). Figure 7.7 shows an example where 
the dotted lines mark the fio and tgo time. 

7.4 Implementation 

The programming of a system for dynamic range control is described in the following 
sections. 

7.4.1 Limiter 

The block diagram of a limiter is presented in Fig. 7.8. The signal xpeak(«) is determined 
from the input with variable attack and release time. The logarithm to the base 2 of this peak 
signal is taken and compared with the limiter threshold. If the signal is above the threshold, 
the difference is multiplied by the negative slope of the limiter LS. Then the antilogarithm 
of the result is taken. The control factor f{n) obtained is then smoothed with a first-order 
low-pass filter (SMOOTH). If the signal xpeak(») lies below the limiter threshold, the 
signal f(n) is set to fin) — 1. The delayed input x(n — D\) is multiplied by the smoothed 
control factor g(n ) to give the output yin). 

7.4.2 Compressor, Expander, Noise Gate 

The block diagram of a compressor/expander/noise gate is shown in Fig. 7.9. The basic 
structure is similar to the limiter. In contrast to the limiter, the logarithm of the signal 
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t = 10.00 ms, t = 80.00 ms 
a r 




0 20 40 60 80 100 120 140 160 180 

t in ms - 



t = 0.20 ms, t = 200.00 ms 
a r 




0 200 400 600 800 1000 1200 1400 1600 1800 

t in ms - 



Figure 7.7 Attack and release behavior for time-constant filters. 




Figure 7.8 Limiter. 



a'rms(h) is taken and multiplied by 0.5. The value obtained is compared with three thresh- 
olds in order to determine the operating range of the static curve. If one of the three 
thresholds is crossed, the resulting difference is multiplied by the corresponding slope 
(CS, ES, NS) and the antilogarithm of the result is taken. A first-order low-pass filter 
subsequently provides the attack and release time. 
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Figure 7.10 Limiter/compressor/expander/noise gate. 



7.4.3 Combination System 

A combination of a limiter that uses PEAK measurement, and a compressor/expander/noise 
gate that is based on RMS measurement, is presented in Fig. 7.10. The PEAK and RMS 
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values are measured simultaneously. If the linear threshold of the limiter is crossed, the 
logarithm of the peak signal xpeak(«) is taken and the upper path of the limiter is used 
to calculate the characteristic curve. If the limiter threshold is not crossed, the logarithm 
of the RMS value is taken and one of the three lower paths is used. The additive terms in 
the limiter and noise gate paths result from the static curve. After going through the range 
detector, the antilogarithm is taken. The sequence f in) is smoothed with a SMOOTH filter 
in the limiter case, or weighted with corresponding attack and release times of the relevant 
operating range (compressor, expander or noise gate). By limiting the maximum level, the 
dynamic range is reduced. As a consequence, the overall static curve can be shifted up by 
a gain factor. Figure 7.1 1 demonstrates this with a gain factor equal to 10 dB. This static 
parameter value is directly included in the control factor gin). 




\ 



G/dB 



Figure 7.11 Shifting the static curve by a gain factor. 

As an example. Fig. 7.12 illustrates the input x(n), the output yin) and the control factor 
g(n) of a compressor/expander system. It is observed that signals with high amplitude are 
compressed and those with low amplitude are expanded. An additional gain of 12 dB 
shows the maximum value of 4 for the control factor gin). The compressor/expander 
system operates in the linear region of the static curve if the control factor is equal to 
4. If the control factor is between 1 and 4, the system operates as a compressor. For 
control factors lower than 1, the system works as an expander (3500 < n < 4500 and 
6800 < n < 7900). The compressor is responsible for increasing the loudness of the signal, 
whereas the expander increases the dynamic range for signals of small amplitude. 



7.5 Realization Aspects 

7.5.1 Sampling Rate Reduction 

In order to reduce the computational complexity, downsampling can be carried out after 
calculating the PEAK/RMS value (see Fig. 7.13). As the signals a'peak(u) and arms in ) are 
already band-limited, they can be directly downsampled by taking every second or fourth 
value of the sequence. This downsampled signal is then processed by taking its logarithm, 
calculating the static curve, taking the antilogarithm and filtering with corresponding attack 
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n 

Figure 7.12 Signals x(n), y(n) and g(n) for dynamic range control. 



and release time with reduced sampling rate. The following upsampling by a factor of 
4 is achieved by repeating the output value four times. This procedure is equivalent to 
upsampling by a factor of 4 followed by a sample-and-hold transfer function. 

The nesting and spreading of partial program modules over four sampling periods is 
shown in Fig. 7.14. The modules PEAK/RMS (i.e. PEAK/RMS calculation) and MULT 
(delay of input and multiplication with g(n)) are performed every input sampling period. 
The number of processor cycles for PEAK/RMS and MULT are denoted by Z1 and Z3 
respectively. The modules LD(X), CURVE, 2 X and SMO have a maximum of Z2 processor 
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Figure 7.13 Dynamic system with sampling rate reduction. 



cycles and are processed consecutively in the given order. This procedure is repeated every 
four sampling periods. The total number of processor cycles per sampling period for the 
complete dynamics algorithm results from the sum of all three modules. 




Cycles 



Figure 7.14 Nesting technique. 



7.5.2 Curve Approximation 

Besides taking logarithms and antilogarithms, other simple operations like comparisons and 
addition/multiplication occur in calculating the static curve. The logarithm of the 
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PEAK/RMS value is taken as follows: 



x = M ■ 2 e , (7.32) 

ld(x)=ld (M) + E. (7.33) 

First, the mantissa is normalized and the exponent is determined. The function ld(M) is 
then calculated by a series expansion. The exponent is simply added to the result. 

The logarithmic weighting factor G and the antilogarithm 2 G are given by 

G = - E - M, (7.34) 

2 G = 2~ e ■ 2~ M . (7.35) 

Here, £ is a natural number and M is a fractional number. The antilogarithm 2 G is calcu- 
lated by expanding the function 2~ M in a series and multiplying by 2~ E . A reduction of 
computational complexity can be achieved by directly using log and antilog tables. 

7.5.3 Stereo Processing 

For stereo processing, a common control factor g(n) is needed. If different control factors 
are used for both channels, limiting or compressing one of the two stereo signals causes a 
displacement of the stereo balance. Figure 7.15 shows a stereo dynamic system in which the 
sum of the two signals is used to calculate a common control factor g(n). The following 
processing steps of measuring the PEAK/RMS value, downsampling, taking logarithm, 
calculating static curve, taking antilog attack and release time and upsampling with a 
sample-and-hold function remain the same. The delay (DEL) in the direct path must be 
the same for both channels. 




Figure 7.15 Stereo dynamic system. 



7.6 Java Applet - Dynamic Range Control 

The applet shown in Fig. 7.16 demonstrates dynamic range control. It is designed for a 
first insight into the perceptual effects of dynamic range control of an audio signal. You 
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can adjust the characteristic curve with two control points. You can choose between two 
predefined audio files from our web server (audio 1 .wav or audio2.wav) or your own local 
wav file to be processed [Gui05]. 



Audio 1 
load .wav-file 



Audio2 



nr 



■stop 



|Lowpass| 



Level 




Gain 


estimation 




computation 



Input level in dB 




Input signal 



Output signal 







□ Bypass Filter 

(c) 2004 ANT. Helmut- Schmidt- University Hamburg, Germany 



Figure 7.16 Java applet - dynamic range control. 



7.7 Exercises 

1. Low-pass Filtering for Envelope Detection 

Generally, envelope computation is performed by low-pass filtering the input signal’s ab- 
solute value or its square. 

1. Sketch the block diagram of a recursive first-order low-pass H(z) — a /[(1 — (1 — 

2. Sketch its step response. What characteristic measure of the envelope detector can 
be derived from the step response and how? 

3. Typically, the low-pass filter is modified to use a non-constant filter coefficient X. 
How does a depend on the signal? Sketch the response to a rect signal of the low- 
pass filter thus modified. 

2. Discrete-time Specialties of Envelope Detection 

Taking absolute value or squaring are non-linear operations. Therefore, care must be taken 
when using them in discrete-time systems as they introduce harmonics the frequency of 
which may violate the Nyquist bound. This can lead to unexpected results, as a simple 
example illustrates. Consider the input signal x(n) = sin(^-n + y>), e [0, 2 jt], 
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1. Sketch x(n), \x(n)\ and x 2 (n) for different values of <p. 

2. Determine the value of the envelope after perfect low-pass filtering, i.e. averaging, 
\x{ri)\. Note: As the input signal is periodical, it is sufficient to consider one pe- 
riod, e.g. 

1 3 

x = - 2^ \x(n)\. 

4 «= o 

3. Similarly, determine the value of the envelope after averaging x 2 (n). 

3. Dynamic Range Processors 

Sketch the characteristic curves mapping input level to output level and input level to gain 
for and describe briefly the application of: 

1. limiter; 

2. compressor; 

3. expander; 

4. noise gate. 
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Chapter 8 

Sampling Rate Conversion 



Several different sampling rates are established for digital audio applications. For broad- 
casting, professional and consumer audio, sampling rates of 32, 48 and 44.1 kHz are used. 
Moreover, other sampling rates are derived from different frame rates for him and video. In 
connecting systems with different uncoupled sampling rates, there is a need for sampling 
rate conversion. In this chapter, synchronous sampling rate conversion with rational factor 
L/M for coupled clock rates and asynchronous sampling rate conversion will be discussed 
where the different sampling rates are not synchronized with each other. 



8.1 Basics 

Sampling rate conversion consists out of upsampling and downsampling and anti-imaging 
and anti-aliasing filtering [Cro83, Vai93, FliOO, Opp99]. The discrete-time Fourier trans- 
form of the sampled signal x (n) with sampling frequency fs — 1 /T ( cos — 27T fs ) is given 
by 



1 OO / \ 

W°)=- Xa(jco+jk— , Q = a>T, (8.1) 

1 k =- 00 ' ' 

COS 

with the Fourier transform X a (jco) of the continuous-time signal x(t). For ideal sampling 
the condition 



X(ej Q ) = jX a (ju), 



|fi| <7 r, 



(8.2) 



holds. 

8.1.1 Upsampling and Anti-imaging Filtering 

For upsampling the signal 

x{n)c^» X(e jn ) (8.3) 
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by a factor L between consecutive samples L — 1 zero samples will be included (see 
Fig. 8.1). This leads to the upsampled signal 



w(m) = 




m — 0, ±L, ±2 L, . . . , 

otherwise, 



(8.4) 



with sampling frequency f' s = \/T' = L ■ fs = L/T (£2' = Cl/L) and the corresponding 
Fourier transform 



oo oo 

W(e jQ, )= J2 ™<jn)e- jmQ '= ^ x(m) e~ jmLn ' = X(e jLQ '). (8.5) 

m=—o o m=—o o 

The suppression of the image spectra is achieved by anti-imaging filtering of w(m) with 
h(m), such that the output signal is given by 

y(m) = w(m) * h(m), (8.6) 

Y(e jQ ') = H(e jn ') • X(e jn ' L ). (8.7) 



To adjust the signal power in the base-band the Fourier transform of the impulse-response 



H(e jn ') = 



L, |S2'| <n/L, 
0, otherwise, 



( 8 . 8 ) 



needs a gain factor L in the pass-band, such that the output signal y(m ) has the Fourier 
transform given by 



Y(e ja ') = LX(e jQ ' L ) (8.9) 

1 ^ / 2: t\ 

= l j J2 X a ljw + jLk—\ (8.10) 



with (8.1) and (8.5) 

l ™ ( 2n \ 

= L Tr ^ X a (ju + j L k—\ (8.11) 

k=—o o ' ' 

1 ^ / 2n\ 

= yj X! x a[j w + j k Yr)- (8 - 12) 



spectrum of signal with f s =Lf s 

The output signal represents the sampling of the input x(t) with sampling frequency f’ s = 

Lf s ■ 



8.1.2 Downsampling and Anti-aliasing Filtering 

For downsampling a signal x(n) by M the signal has to be band- limited to tx /M in order to 
avoid aliasing after the downsampling operation (see Fig. 8.2). Band-limiting is achieved 
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fs= 1 /T f’s-l/T’-Lfs f’s 



t 2 -ojT f2'=coT , =coT/L=Q/L 




0 1 2 3 n n 2n 4n 6n Q 




01 23456789m tu/3 k 2n D.' 



Figure 8.1 Upsampling by L and anti-imaging filtering in time and frequency domain. 
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and for the Fourier transform of the output signal we can derive 

Y(e ja ) = —X(e jn ' /M ) =-- f" X a ( ja> + jk — 

M M T J MT 



M T 



k =— oo 



MT 



with (8.1) 



1 ^ / 2it\ 

- Y. X a ljw + jk—) , 



k=—o o 



spectrum of signal with f' s =f$/M 

which represents a sampled signal y(n) with f' s — fs/M. 



(8.19) 



( 8 . 20 ) 




fg=1/T f s f' s =1/T’=f s /M 

Q=(i)T Q’=coT’=coTM=ilM 






0 1 2 3 n n 2n 4n 6n Cl’ M=3 

M7t/ 3 M27i/3 Mti M47i/3 M2 ti Q’=QM 

Figure 8.2 Anti-aliasing filtering and downsampling by M in time and frequency domain. 



8.2 Synchronous Conversion 

Sampling rate conversion for coupled sampling rates by a rational factor L/M can be 
performed by the system shown in Fig. 8.3. After upsampling by a factor L, anti-imaging 
filtering at Lf s is carried out, followed by downsampling by factor M. Since after up- 
sampling and filtering only every Mth sample is used, it is possible to develop efficient 
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algorithms that reduce complexity. In this respect two methods are in use: one is based on 
a time-domain interpretation [Cro83] and the other [Hsi87] uses Z-domain fundamentals. 
Owing to its computational efficiency, only the method in the Z-domain will be considered. 




Figure 8.3 Sampling rate conversion by factor L/M. 

Starting with the finite impulse response h(n) of length N and its Z-transform 



N-i 

H(z)=J2 h ^ z ~ H ' ( 8 . 21 ) 

n = 0 

the polyphase representation [Cro83, Vai93, FliOO] with M components can be expressed as 

M— 1 

H(z) = J2 z~ k E k (z M ) (8.22) 

k = 0 



with 



e k (n ) = h(nM + k), & = 0, 1, . . . , M — 1, (8.23) 



or 



with 



M - 1 

H{z) = z~ {M ~ l ~ k) R k (z M ) (8.24) 

k= 0 

r k (n ) = h(nM — k), k = 0, 1, . . . , M — 1. (8.25) 



The polyphase decomposition as given in (8.22) and (8.24) is referred to as type 1 and 2, re- 
spectively. The type 1 polyphase decomposition corresponds to a commutator model in the 
anti-clockwise direction whereas the type 2 is in the clockwise direction. The relationship 
between R(z ) and E{z) is described by 



Rk(z) — Eu-i-kiz). 



(8.26) 



With the help of the identities [Vai93] shown in Fig. 8.4 and the decomposition (Euclid’s 
theorem) 

z ~ l = z ~ pL z qM . (8.27) 

it is possible to move the inner delay elements of Fig. 8.5. Equation (8.27) is valid if M 
and L are prime numbers. In a cascade of upsampling and downsampling, the order of 
functional blocks can be exchanged (see Fig. 8.5b). 

The use of polyphase decomposition can be demonstrated with the help of an example 
for L — 2 and M — 3. This implies a sampling rate conversion from 48 kHz to 32 kHz. 
Figures 8.6 and 8.7 show two different solutions for polyphase decomposition of sam- 
pling rate conversion by 2/3. Further decompositions of the upsampling decomposition of 
Fig. 8.7 are demonstrated in Fig. 8.8. First, interpolation is implemented with a polyphase 
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Figure 8.4 Identities for sampling rate conversion. 




Figure 8.5 Decomposition in accordance with Euclid's theorem. 

decomposition and the delay z~ 1 is decomposed to z _1 = z _2 z 3 . Then, the downsampler 
of factor 3 is moved through the adder into the two paths (Fig. 8.8b) and the delays are 
moved according to the identities of Fig. 8.4. In Fig. 8.8c, the upsampler is exchanged with 
the downsampler, and in a final step (Fig. 8.8d) another polyphase decomposition of £o(z) 
and E i (z) is carried out. The actual filter operations Zsoa- ( z) and E u- (z) with k — 0, 1,2 are 
performed at ^ of the input sampling rate. 



8.3 Asynchronous Conversion 

Plesiochronous systems consist of partial systems with different and uncoupled sampling 
rates. Sampling rate conversion between such systems can be achieved through a DA 
conversion with the sampling rate of the first system followed by an AD conversion with the 
sampling rate of the second system. A digital approximation of this approach is made with 
a multirate system [Lag81, Lag82a, Lag82b, Lag82c, Lag83, Ram82, Ram84]. Figure 8.9a 
shows a system for increasing the sampling rate by a factor L followed by an anti-imaging 
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Figure 8.7 Polyphase decomposition for upsampling L/M = 2/3. 



filter H(z) and a resampling of the interpolated signal y(k). The samples y(k) are held 
for a clock period (see Fig. 8.9c) and then sampled with output clock period T$ 0 — 1 / fs 0 ■ 
The interpolation sampling rate must be increased sufficiently that the difference of two 
consecutive samples y(k ) is smaller than the quantization step Q. The sample-and-hold 
function applied to y(k) suppresses the spectral images at multiples of Lf s (see Fig. 8.9b). 
The signal obtained is a band-limited continuous-time signal which can be sampled with 
output sampling rate fs 0 . 

For the calculation of the necessary oversampling rate, the problem is considered in 
the frequency domain. The sine function of a sample-and-hold system (see Fig. 8.9b) at 
frequency / = (L — \)fs is given by 
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Figure 8.9 Approximation of DA/ AD conversions. 



This value of (8.29) should be lower than % and allows the computation of the interpolation 
factor L. For a given word-length w and quantization step Q, the necessary interpolation 
rate L is calculated by 



e>J_ 

2 ~ 2 L 

2 _ 2 L 

^ L > 2 w ~ l . 



For a linear interpolation between upsampled samples y(k), we can derive 



E(f) = 






r\2 



(i) 



( 7 T(L-l,)fs\ 2 

\ L fs ) 



1 

QLf-' 



(8.30) 

(8.31) 

(8.32) 



(8.33) 



(8.34) 



(8.35) 
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With this it is possible to reduce the necessary interpolation rate to 

L\ > 2 U '/ 2-1 . (8.36) 

Figure 8.10 demonstrates this with a two-stage block diagram. First, interpolation up to a 
sampling rate L \ is performed by conventional filtering. In a second stage upsampling by 
factor Li is done by linear interpolation. The two-stage approach must satisfy the sampling 
rate Lf s = (L l L 2 )fs- 

The choice of the interpolation algorithm in the second stage enables the reduction of 
the first oversampling factor. More details are discussed in Section 8.2.2. 



x(n) 




y( m ) 



Figure 8.10 Linear interpolation before virtual sample-and-hold function. 



8.3.1 Single-stage Methods 

Direct conversion methods implement the block diagram [Lag83, Smi84, Par90, Par91a, 
Par91b, Ada92, Ada93] shown in Fig. 8.9a. The calculation of a discrete sample on an 
output grid of sampling rate fs 0 from samples x(n) at sampling rate fs, can be written as 

DFT[.r (« - a)] = X(e jQ ) e~ joia = X(e jQ )H a (e jQ ), (8.37) 



where 0 < a < 1 . With the transfer function 

H a (e jQ ) = e~ jan 

and the properties 



H(e ja ) = 

the impulse response is given by 



1, 0<|£2|<f2 c , 

0 , < |£ 2 | < 7 r, 



h a {n) = h(n — a) — 



sin[f2 f (n — a)] 
7 r f 2 c (n — a) 



From (8.37) we can express the delayed signal 

OO 

x(n — a) = x{m)h(n — a — m) 



— x( m ') 



sin[f2 c (77 — a — m)] 
it £l c (n — a — m ) 



(8.38) 

(8.39) 

(8.40) 

(8.41) 

(8.42) 



as the convolution between x(n ) and h(n — a). Figure 8.11 illustrates this convolution in 
the time domain for a fixed a. Figure 8.12 shows the coefficients h(n — a,) for discrete 
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Figure 8.11 Convolution sum (8.42) in the time domain. 




Figure 8.12 Convolution sum (8.42) for different o',-. 



a; (7 = 0,..., 3) which are obtained from the intersection of the sine function with the 
discrete samples x(n). 

In order to limit the convolution sum, the impulse response is windowed, which gives 



hw(n — a;) — w(n) 



sin[f2 r (n — a,-)] 
7 r £2 c (n — a ,•) 



n = 0, . . . , 2 M. 



(8.43) 



From this, the sample estimate 



M 

x(n — oii) = x(m)hw{n — a; — m) (8.44) 

m=—M 

results. A graphical interpretation of the time-variant impulse response which depends on 
a, is shown in Fig. 8.13. The discrete segmentation between two input samples into N 
intervals leads to N partial impulse responses of length 2 M + 1 • 

If the output sampling rate is smaller than the input sampling rate (fs 0 < fs , ), band- 
limiting (anti-aliasing) to the output sampling rate has to be done. This can be achieved 
with factor /) = fs 0 /fsj and leads, with the scaling theorem of the Fourier transform, to 

P$2 C sin[fiQ c (n - a)] 

n (n — a) — 



7 t /3£2 c (n — a) 



(8.45) 
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This time-scaling of the impulse response has the consequence that the number of coeffi- 
cients of the time-variant partial impulse responses is increased. The number of required 
states also increases. Figure 8.14 shows the time-scaled impulse response and elucidates 
the increase in the number M of the coefficients. 
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Figure 8.13 Sine function and different impulse responses. 
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Figure 8.14 Time-scaled impulse response. 
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8.3.2 Multistage Methods 

The basis of a multistage conversion method [Lag81, Lag82, Kat85, Kat86] is shown 
in Fig. 8.15a and will be described in the frequency domain as shown in Fig. 8.15b-d. 
Increasing the sampling rate up to Lf s before the sample-and-hold function is done in four 
stages. In the first two stages, the sampling rate is increased by a factor of 2 followed by 
an anti-imaging filter (see Fig. 8.15b,c), which leads to a four times oversampled spectrum 
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(Fig. 8.15d). In the third stage, the signal is upsampled by a factor of 32 and the image 
spectra are suppressed (see Fig. 8.15d,e). In the fourth stage (Fig. 8.15e) the signal is 
upsampled to a sampling rate of L f s by a factor of 256 and a linear interpolator. The 
sine 2 function of the linear interpolator suppresses the images at multiples of 128 fs up to 
the spectrum at Lf s . The virtual sample-and-hold function is shown in Fig. 8.15f, where 
resampling at the output sampling rate is performed. A direct conversion of this kind of 
cascaded interpolation structure requires anti-imaging filtering after every upsampling with 
the corresponding sampling rate. Although the necessary filter order decreases owing to 
a decrease in requirements for filter design, an implementation of the filters in the third 
and fourth stages is not possible directly. Following a suggestion by Lagadec [Lag82c], 
the measurement of the ratio of input to output rate is used to control the polyphase 
filters in the third and fourth stages (see Fig. 8.16a, CON = control) to reduce complexity. 
Figures 8.16b-d illustrate an interpretation in the time domain. Figure 8.16b shows the 
interpolation of three samples between two input samples x (n) with the help of the first and 
second interpolation stage. The abscissa represents the intervals of the input sampling rate 
and the sampling rate is increased by factor of 4. In Fig. 8.16c the four times oversampled 
signal is shown. The abscissa shows the four times oversampled output grid. It is assumed 
that output sample y(m = 0) and input sample x(n = 0) are identical. The output sample 
y(m = 1 ) is now determined in such a form that with the interpolator in the third stage only 
two polyphase biters just before and after the output sample need to be calculated. Hence, 
only two out of a total of 31 possible polyphase biters are calculated in the third stage. 
Figure 8.16d shows these two polyphase output samples. Between these two samples, the 
output sample y (m = I ) is obtained with a linear interpolation on a grid of 255 values. 

Instead of the third and fourth stages, special interpolation methods can be used to cal- 
culate the output y(m) directly from the four times oversampled input signal (see Fig. 8.17) 
[Sti91, Cuc91, Liu92], The upsampling factor L 3 = 2 W ~ 3 for the last stage is calculated 
according to L — 2 w ~ l — L 1 L 2 L 3 = 2 2 L 3 . Section 8.4 is devoted to different interpolation 
methods which allow a real-time calculation of biter coefficients. This can be interpreted 
as time-variant biters in which the biter coefficients are derived from the ratio of sampling 
rates. The calculation of one biter coefficient set for the output sample at the output rate 
is done by measuring the ratio of input to output sampling rate as described in the next 
section. 



8.3.3 Control of Interpolation Filters 

The measurement of the ratio of input and output sampling rate is used for controlling the 
interpolation biters [Lag82a], By increasing the sampling rate by a factor of L the input 
sampling period is divided into L — 2 w ~ l — 2 15 parts for a signal word-length of w — 16 
bits. The time instant of the output sample is calculated on this grid with the help of the 
measured ratio of sampling periods Ts 0 / Ty, as follows. 

A counter is clocked with Lf S] and reset by every new input sampling clock. A saw- 
tooth curve of the counter output versus time is obtained as shown in Fig. 8.18. The 
counter runs from 0 to L — 1 during one input sampling period. The output sampling period 
Ts 0 starts at time f,-_ 2 , which corresponds to counter output Zi- 2 , and stops at time f,_ 1 , 
with counter output 1 . The difference between both counter measurements allows the 
calculation of the output sampling period 7’y 0 with a resolution of L f S[ . 
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Figure 8.15 Multistage conversion - frequency-domain interpretation. 



The new counter measurement is added to the difference of previous counter measure- 
ments. As a result, the new counter measurement is obtained as 

ti — (ti - 1 + Ts 0 ) © Tsj. (8.46) 

The modulo operation can be carried out with an accumulator of word-length w — 1 = 15. 
The resulting time ti determines the time instant of the output sample at the output sampling 
rate and therefore the choice of the polyphase filter in a single-stage conversion or the time 
instant for a multistage conversion. 
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Figure 8.16 Time-domain inteipretation. 




Figure 8.17 Sampling rate conversion with interpolation for calculating coefficients of a time- variant 
interpolation filter. 



The measurement of Ts 0 /Ts l is illustrated in Fig. 8.19: 

• The input sampling rate f$, is increased to Mzfsi using a frequency multiplier 
where Mz = 2 W . This input clock increase by the factor Mz triggers a ut-bit counter. 
The counter output z is evaluated every Mo output sampling periods. 

• Counting of Mo output sampling periods. 

• Simultaneous counting of the Mi input sampling periods. 
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Figure 8.18 Calculation of tj . 
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Figure 8.19 Measurement of T $ 0 / . 



The time intervals d\ and d 2 (see Fig. 8.19) are given by 

di = M, T S , + — T S , = (m, + ) T S , , (8.47) 

M z \ M Z J 

d 2 = M 0 T So , (8.48) 

and with the requirement d\ — d 2 we can write 



« 0 7i„ = (M, + ^)r Sl 

Tsp _ Mi + (z - zq )/M z _ M Z M, + (z - zp) 
Ts i Mq MzMp 



• Example 1 : w — 0 -> Mz — 1 



Ts^ = Mj_ 
Ts, 2 15 



(8.50) 



With a precision of 15 bits, the averaging number is chosen as Mo — 2 15 and the 
number M/ has to be determined. 
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• Example 2: w — 8 — > Mz = 2 8 



— = 2 S "'+/r a ,) (8.51) 

7s ; 2 8 2 7 

With a precision of 15 bits, the averaging number is chosen as Mo — 2 7 and the 
number Mj and the counter outputs have to be determined. 



The sampling rates at the input and output of a sampling rate converter can be calculated 
by evaluating the 8-bit increment of the counter for each output clock with 

z = -M Z = 4^256, (8.52) 

7 Si JSo 



as seen from Table 8.1. 



Table 8.1 Counter increments for different sampling rate conversions. 



Conversion/kHz 


8-bit counter increment 


32 - 


* 48 


170 


44.1 - 


■» 48 


235 


32 - 


* 44.1 


185 


48 - 


■» 44.1 


278 


48 - 


* 32 


384 


44.1 - 


* 32 


352 



8.4 Interpolation Methods 

In the following sections, special interpolation methods are discussed. These methods 
enable the calculation of time-variant filter coefficients for sampling rate conversion and 
need an oversampled input sequence as well as the time instant of the output sample. A 
convolution of the oversampled input sequence with time-variant filter coefficients gives the 
output sample at the output sampling rate. This real-time computation of filter coefficients 
is not based on popular filter design methods. On the contrary, methods are presented for 
calculating filter coefficient sets for every input clock cycle where the filter coefficients 
are derived from the distance of output samples to the time grid of the oversampled input 
sequence. 

8.4.1 Polynomial Interpolation 

The aim of a polynomial interpolation [Liu92] is to determine a polynomial 

N 

p N (x) = ^2 aix 1 
1=0 



(8.53) 
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of /Vth order representing exactly a function f (x) at N + I uniformly spaced Xj, i.e. 
PN(xi) — f(xi ) = yi for i = 0, . . . , N . This can be written as a set of linear equations 



1 


xo 


X l ' 


• X 0 




flo 




yo 


1 


XI 


X 1 • 


• *? 




fll 


= 


y i 


1 


XN 


X N • 


1 




_on_ 




JN_ 



(8.54) 



The polynomial coefficients a/ as functions of jo- • ■ • , y,v are obtained with the help of 
Cramer’s rule according to 









i th column 


1 


XO 


x o ■ ■ 


• VO • • • x { 


1 


Xl 


X I •• 


y t • • • -v 


1 


XN 


X N •• 


• y N ■ ■ ■ x 



a/ = , i = 0, l, . . . , N. (8.55) 



1 


XO 


X l ' ■ 




1 


Xl 


A ■ 




1 


XN 


X N ■■ 





For uniformly spaced x, = i with i — 0. I /V the interpolation of an output sample 

with distance a gives 

N 

y(n + a) = ^2 a i(n + a Y ■ (8.56) 

;=0 

In order to determine the relationship between the output sample y(n + a) and y, , a set of 
time-variant coefficients c; needs to be determined such that 



N/2 

y(n + a)= ^2 Ci(a)y(n + i). 

i=-N/2 



(8.57) 



The calculation of time- variant coefficients c, (a) will be illustrated by an example. 

Example: Figure 8.20 shows the interpolation of an output sample of distance a with 
N — 2 and using three samples which can be written as 

2 

y(n + a) = a,- (n + a)' . 

i=0 



(8.58) 
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1 

a i 








1 




x(n-1) x(n) x(n 


+1) 



Figure 8.20 Polynomial interpolation with three samples. 



The samples y{n + i), with i = — 1, 0, 1, can be expressed as 

2 

y(n + 1) = aj (n + 1)', a = 1, 
i = 0 
2 

y(n) — Gin' , a = 0, 
i= 0 
2 

y(n — 1) — ^ a,(n — 1)', a = — 1, 
i=0 

or in matrix notation 



"1 (n + 1) 


(« + 1) 2 ~ 


ao 




y(n + 1) 


1 n 


n 2 


a\ 


= 


>'(«) 


_1 (n-1) 


(n - 1) 2 _ 


Cl2 




_y(n - 1)_ 



The coefficients n,- as functions of y,- are then given by 



a 0 

a\ 

G2 



~n(n — 1) 
2 

2 n - 1 
2 

1 



1 — n 



2 



2/7 

-1 



//(/? + 1) 
2 

2/7 + 1 
2 

1 



y(n + 1 ) 

y(n) 
y(n - 1) 



such that 



y (n + a) — gq + a \ (// + a) + 02 (n + a) 2 
is valid. The output sample y(n + a) can be written as 

1 

y(n + a) = Ci(a)y(n + i ) 
i=-i 

= c- 1 y (n - 1) + coy(n) + c\y{n + 1). 



(8.59) 



(8.60) 



(8.61) 



(8.62) 



(8.63) 
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Equation (8.62) with aj from (8.61) leads to 
y(n + a) = 



1 1 

-y(n + 1) - y(n) + -y(n - 1) 



C n + a) 



+ 



2 n — 1 2 n + 1 

y(n + 1) + 2 ny(n) — y(n - 1) 



2 

n (n — 1 ) 



(« + a) 



t n (n + 1 ) 

y(n + 1) + (1 - rr)y(n) H y(n - 1). (8.64) 



2 ' 2 

Comparing the coefficients from (8.63) and (8.64) for n — 0 gives the coefficients 



c_ i = ^o:(qi — 1), 
co = — (a — l)(a + 1) = 1 — a 2 , 
ci = ^o:(ai + 1). 



8.4.2 Lagrange Interpolation 

Lagrange interpolation for N + 1 samples makes use of the polynomials /,• (x) which have 
the following properties (see Fig. 8.21): 



l i (Xk) — &ik — 



1, 

0, 



i = k, 
elsewhere. 



(8.65) 



Based on the zeros of the polynomial (x), it follows that 



U{x) = ai(x — xq) ■ ■ ■ (x — Xi-i)(x — .v,-+i) ■ ■ ■ (x — xn). (8.66) 



With l( (xj ) = 1 the coefficients are given by 

1 

aj(xj ) = . 

(Xi - Xo) ■ ■ ■ (Xi - Xi-\){Xj - x i+ i) ■ ■ ■ (Xi - x N ) 

The interpolation polynomial is expressed as 

N 

Pn(x ) = ^2 U(x)yi = lo(x)yo H h l N (x)y N . 

i = 0 



(8.67) 



( 8 . 68 ) 



With a = ]~[ j=o( x ~ x j)i (8.66) can be written as 

i n^o 



li (x) = at 



X — Xi 



nU.j 



' 



X - Xi 



n 



7 = 0 , 7^1 



, Xi - X i 



(8.69) 



For uniformly spaced samples 

Xi — .vo + ih 

and with the new variable a as given by 



(8.70) 



x = xq + oth. 



(8.71) 
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we get 



and hence 



x — Xj (xo + ah) — (x'o + jh) a — j 

Xi-Xj (xo + ih) - (xo + jh) i-j 

N 

n Oi — 7 
7- 

/• 0. / .- / * - J 



For even N we can write 



and for odd N, 



N/2 _ . 

li(x(a))= ^—4 

j=-N/2,j±i 1 J 



AM- 1/2 _ • 

U(x(ot))= ]""[ '[ 

j=- N -l/2,jjti 1 J 



The interpolation of an output sample is given by 



N/2 

y(n + a) = ^2 h (<x)y(n + 0- 

i=—N/2 



(8.72) 



(8.73) 



(8.74) 



(8.75) 



(8.76) 





X 



Figure 8.21 Lagrange polynomial. 



Example: For N = 2,3 samples, 

1 a — ' 1 

l-i(x(a))= ]""[ “ _ J . = -or (o; - 1), 

7=-Li¥-i 1 J 1 

l _ • 

k (x(a))= ]""[ “ _ { = -(a - l)(a + 1) = 1 - a 2 , 

U J 

1 _ • j 

l\(x(a))= ]""[ “ _ \ = -a(a + 1). 

j=-hm 1 J z 



8.4.3 Spline Interpolation 

The interpolation using piecewise defined functions that only exist over finite intervals 
is called spline interpolation [Cuc91]. The goal is to compute the sample y(n + a) = 
2 (°t)y(n + i) from weighted samples y(n + i). 
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A B-spline M? (x) of /Vth order using m + 1 samples is defined in the interval \xk, , 
Xk+tn\ by 

k-\-m 

M”(X) = (8.77) 

i=k 

with the truncated power functions 



(pj(x) = (X - Xi)+ — 



0, x < Xi, 

(x - Xi) N , X>Xj. 



(8.78) 



In the following Mq (x) — Xw=o a i < Pi(x) will be considered for k = 0 where Mq (x) = 0 
for x < xq and Mq (x) — 0 for x > x m . Figure 8.22 shows the truncated power functions 
and the B-spline of A'th order. With the definition of the truncated power functions we can 
write 



Mq (x) — (*) + ai0i(x) H h a m cp m (x) 

— ao(x - * 0 )+ + a\(x — x\)+ h a m (x - x m )+, 

and after some calculations we get 

Mq ( x ) = oq(Xq + ciXq~ 1 x H 1- cn-ixqx n ~ 1 + x N ) 





+ a i(x^ 


+ c\x N ^ x + • • • 


+ C^-\X\X N 






+ a m (x^ 


+ C\x^~ l x + • • 


■ + C N - ix m x N 


- X +x N 


4> 0 (x)j 


<t>i(x) ] 


4 > 2 ( x ) j 


‘t’m Mj 




x o 


X 1 


x 2 


x m 


x m+1 












*0 


Xl 


x 2 


x m 


x m+1 



(8.79) 



(8.80) 



Figure 8.22 Truncated power functions and the B-spline of IV th order. 



With the condition Mq (x) = 0 for x > x m , the following set of linear equations can be 
written with (8.80) and the coefficients of the powers of x: 



1 


1 


1 _ 




ao 




"0" 


XO 


x\ ■ 


* Xm 




a\ 




0 


x 2 

A o 


•• 


• X 2 

m 




C12 


— 


0 


1 

X 

0 ^ ' 


< •• 


r N 
A m _ 




_ 




_0_ 



(8.81) 
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The homogeneous set of linear equations has non-trivial solutions for m > N . The mini- 
mum requirement results in m — N + 1. For m — N + 1, the coefficients [Boe93] can be 
obtained as follows: 



i th column 



1 


1 


1 


• o • • 


i 


X0 


XI 


X 2 ■ ■ 


• o • • 


XN + 1 


N 


V 


W 


• o • • 


V 


x o 


X 1 


X 2 * * 


' X N + 1 



a t = , i =0, 1, . . . , AH- 1. (8.82) 

II 1 1 ■■■ 1 | 



xo 


X\ 


X2 


*tV+l 


N + 1 


v JV+i 


r N + 1 


V AT+1 


0 


A i 







Setting the /th column of the determinant in the numerator of (8.82) equal to zero cor- 
responds to deleting the column. Computing both determinants of Vandermonde matrices 
[Bar90] and division leads to the coefficients 



a i 



nf=o lijtjixi-xj) 



and hence 



JV+I 



Ml 



'w = E 



(■ X - *,-)+ 



=0 Y\j=0,ijij( x i x j) 



For some k we obtain 



<r+iV+l 



Ml 



r w= E 



(x 

jN+1 



' Xi)" 



=k Y[j=0,ijkj( Xi X i'l 



(8.83) 



(8.84) 



(8.85) 



Since the functions ( x ) decrease with increasing N, a normalization of the form ( x ) 

= (xk+N+i ~ Xk)M F is done, such that for equidistant samples we get 



N?(x) = (N + 1) • M%(x). 



(8.86) 



The next example illustrates the computation of B-splines. 
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Example: For N — 3, m — 4, and five samples the coefficients according to (8.83) are given 
by 

1 

do = , 

(*0 - M)(XO - X 3 )(xo - X2)(xo - Xl) 

1 

At = , 

(xi - X 4 )(X 1 - X 3 )(xi - x 2 )(xi - X 0 ) 

1 

a 2 — , 

(x 2 - x 4 )(x 2 - x 3 )(x 2 - X 1 ) (x 2 - xo) 

1 

d3 = , 

(x 3 - x 4 )(x 3 - x 2 )(x 3 - Xi)(x 3 - xo) 

1 

d4 = . 

(x 4 - x 3 )(x 4 - X2)(X2 - Xl)(x 3 - xo) 

Figure 8.23a,b shows the truncated power functions and their summation for calculating 
Nq (x). In Fig. 8.23c the horizontally shifted )V ( 3 (x) are depicted. 





Figure 8.23 Third-order B-spline (N = 3, m = 4, 5 samples). 
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y(n+a) 




x n o x 9 x n1 x n x n+1 x . 9 x n . o x n ,- x . 

n-3 n -Z n-1 n n+1 n+2 n+3 n+4 n+5 



y(n+a) 




X n-3 X n-2 X n-1 X n X n + 1 X n+2 X n + 3 X n + 4 X n + 5 

Figure 8.24 Interpolation with B-splines of second and third order. 



A linear combination of B-splines is called a spline. Figure 8.24 shows the interpolation 
of sample y(n + a ) for splines of second and third order. The shifted B-splines (x) are 
evaluated at the vertical line representing the distance a. With sample v(n) and the normal- 
ized B-splines N { n ( x), the second- and third-order splines are respectively expressed as 

1 

y(n + a) = E y{n + i)Nl_ l+i {a) (8.87) 

i=— 1 

and 

2 

y(n + a) = ^2 y(n + i)N^_ 2+i (a). (8.88) 

;=— l 



The computation of a second-order B-spline at the sample index a is based on the 
symmetry properties of the B-spline, as depicted in Fig. 8.25. With (8.77), (8.86) and the 
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symmetry properties shown in Fig. 8.25, the B-splines can be written in the form 

3 



N 2 (a) = A^q(ck) = 3 a i( a ~ *<)+, 

i = 0 
3 

iVf(l + a) = Nq(1 + a) = 3 a,-( 1 + a - x ( -)+ , 



i=0 

3 



;=0 



3 

L 

i = 0 



)Vq ( 2 + a) = Nq( 1 — a) —3 ^ <7,(2 + a — Xi) 2 + — 3 ^ a,-(l — a — x,-) 2 . (8.89) 
With (8.83) we get the coefficients 

flo : 



and thus 



1 



1 



at = 



<72 = 



(0 — 1 )(— 2 )(— 3 ) 6 ’ 

1 _ 1 

(1 — 0) ( 1 — 2)(1 — 3) = 2’ 

1 _ 1 

(2 — 0)(2 — 1)(2 — 3) ~~ ~2' 



Ai|(o<) = 3[<7oo’ 2 ] = — ^ o / 2 , 



N 2 (a) - 3[<7 0 (1 + a) 2 + aia 2 ] — -^(1 + a) 2 + jet 2 , 
N 2 (a) = 3[<7 0 (1 - a) 2 ] = -1(1 - a) 2 . 




h(2)=N, (1+a)=N„ (1+a) 



(8.90) 



(8.91) 



Figure 8.25 Exploiting the symmetry properties of a second-order B-spline. 

Owing to the symmetrical properties of the B-splines, the time-variant coefficients of 
the second-order B-spline can be derived as 



Nj(a) = h{\) = 


- w. 


(8.92) 


N 2 (a) = h( 2) = 


- j(l + a) 2 + |a 2 , 


(8.93) 


N 2 (a) = h(3) = 


1 

M — 
1 

IO 


(8.94) 
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In the same way the time-variant coefficients of a third-order B-spline are given by 





N%(a) = h( 


(8.95) 




N% («) = h(2) = g(l + a) 3 — |a 3 , 


(8.96) 




N^a) = h( 3) = 1(2 - a) 3 - |(1 - a) 3 , 


(8.97) 




(V 0 3 (a) = /i(4)=^(l-a) 3 . 


(8.98) 


Higher-order B-splines are given by 






2 

y(n + a) — E y(n + i)N*_ 2+i (a), 
/=- 2 


(8.99) 




3 

y(H + a) = E y(« + Q(V 3 _ 3 _|_ ( - (a), 

i=— 2 


(8.100) 




3 

y(n + a) = E y(n + /)(V^_ 3+( (o;). 
/=— 3 


(8.101) 


Similar sets of coefficients can be derived here as well. Figure 8.26 illustrates this for 
fourth- and sixth-order B-splines. 

Generally, for even orders we get 




N/2 

y(n + a) = E ^JV/2+i («))'(» + 0. 
i=-N/2 


(8.102) 



and for odd orders 



y(n + a) — 



(N+ 1)/2 

E 



i=-(JV-l)/2 



N* N _ l)/2+i (oc)y(n + i). 



(8.103) 



For the application of interpolation the properties in the frequency domain are important. 
The zero-order B-spline is given by 



l 



N° 0 (x) = E ai<pi(x) 
1=0 



0, x < 0, 

1, 0 < x < 1, 

0, x > 1, 



(8.104) 



and the Fourier transform gives the sine function in the frequency domain. The first-order 
B-spline given by 



2 

Nq (*) = 2£ fl i0i (•*) — 

1=0 



0, 


x < 0, 


1 


-x, 


0 < x < 1, 


2 




i 


1 X, 


1 < x < 2, 


2 




0, 


x > 2, 



(8.105) 
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Figure 8.26 Interpolation with B-splines of fourth and sixth order. 



leads to a sine 2 function in the frequency domain. Higher-order B-splines can be derived 
by repeated convolution [Chu92] as given by 

N n (x) = N 0 (x)*N n - 1 (x). (8.106) 

Thus, the Fourier transform leads to 

FT[Ai A '(x)] = sinc iv+1 (/). (8.107) 



With the help of the properties in the frequency domain, the necessary order of the spline 
interpolation can be determined. Owing to the attenuation properties of the sinc ,v+l (/) 
function and the simple real-time calculation of the coefficients, spline interpolation is well 
suited to time-variant conversion in the last stage of a multistage sampling rate conversion 
system [Z6194]. 
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8.5 Exercises 

1. Basics 

Consider a simple sampling rate conversion system with a conversion rate of | . The system 
consists of two upsampling blocks, each by 2, and one downsampling block by 3. 

1. What are anti-imaging and anti-aliasing filters and where do we need them in our 
system? 

2. Sketch the block diagram. 

3. Sketch the input, intermediate and output spectra in the frequency domain. 

4. How is the amplitude affected by the up- and downsampling and where does it come 
from? 

5. Sketch the frequency response of the anti-aliasing and anti-imaging filters needed for 
this upsampling system. 

2. Synchronous Conversion 

Our system will now be upsampled directly by a factor of 4, and again downsampled by 
a factor of 3, but with linear interpolation and decimation methods. The input signal is 
x(n ) = sin(njr/6), n = 0, . . . , 48. 

1. What are the impulse responses of the two interpolation filters? Sketch their magni- 
tude responses. 

2. Plot the signals (input, intermediate and output signal) in the time domain using 
Matlab. 

3. What is the delay resulting from the causal interpolation/decimation filters? 

4. Show the error introduced by this interpolation/decimation method, in the frequency 
domain. 



3. Polyphase Representation 

Now we extend our system using a polyphase decomposition of the interpolation/decima- 
tion filters. 

1. Sketch the idea of polyphase decomposition using a block diagram. What is the 
benefit of such decomposition? 

2. Calculate the polyphase filters for up- and downsampling (using interpolation and 
decimation). 

3. Use Matlab to plot all resulting signals in the time and frequency domain. 
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4. Asynchronous Conversion 

1 . What is the basic concept of asynchronous sampling rate conversion? 

2. Sketch the block diagram and discuss the individual operations. 

3. What is the necessary oversampling factor L for a 20-bit resolution? 

4. How can we simplify the oversampling operations? 

5. How can we make use of polyphase filtering? 

6. Why are halfband filters an efficient choice for the upsampling operation? 

7. Which parameters determine the interpolation algorithms in the last stage of the 
conversion? 
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Chapter 9 

Audio Coding 



For transmission and storage of audio signals, different methods for compressing data have 
been investigated besides the pulse code modulation representation. The requirements of 
different applications have led to a variety of audio coding methods which have become 
international standards. In this chapter basic principles of audio coding are introduced and 
the most important audio coding standards discussed. Audio coding can be divided into 
two types: lossless and lossy. Lossless audio coding is based on a statistical model of the 
signal amplitudes and coding of the audio signal (audio coder). The reconstruction of the 
audio signal at the receiver allows a lossless resynthesis of the signal amplitudes of the 
original audio signal (audio decoder). On the other hand, lossy audio coding makes use of a 
psychoacoustic model of human acoustic perception to quantize and code the audio signal. 
In this case only the acoustically relevant parts of the signal are coded and reconstructed 
at the receiver. The samples of the original audio signal are not exactly reconstructed. The 
objective of both audio coding methods is a data rate reduction or data compression for 
transmission or storage compared to the original PCM signal. 

9.1 Lossless Audio Coding 

Lossless audio coding is based on linear prediction followed by entropy coding [Jay84] as 
shown in Fig. 9.1: 

• Linear Prediction. A quantized set of coefficients P for a block of M samples is 
determined which leads to an estimate x(n) of the input sequence x(n). The aim is to 
minimize the power of the difference signal d (n) without any additional quantization 
errors, i.e. the word-length of the signal x(n) must be equal to the word-length of 
the input. An alternative approach [Han98, HanOl] quantizes the prediction signal 
x(n ) such that the word-length of the difference signals d(n) remains the same as the 
input signal word-length. Figure 9.2 shows a signal block x(n) and the corresponding 
spectrum \ X (f)\. Filtering the input signal with the predictor filter transfer function 
P(z) delivers the estimate x (n). Subtracting input and prediction signal yields the 
prediction error d(n), which is also shown in Fig. 9.2 and which has a considerably 
lower power compared to the input power. The spectrum of this prediction error is 
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nearly white (see Fig. 9.2, lower right). The prediction can be represented as a filter 
operation with an analysis transfer function Ha(z ) = 1 — P(z) on the coder side. 

Coder 




Decoder 




x(n) 



Figure 9.1 Lossless audio coding based on linear prediction and entropy coding. 

• Entropy Coding. Quantization of signal d(n) due to the probability density func- 
tion of the block. Samples d(n) of greater probability are coded with shorter data 
words, whereas samples d(n) of lesser probability are coded with longer data words 
[Huf52], 

• Frame Packing. The frame packing uses the quantized and coded difference signal 
and the coding of the M coefficients of the predictor filter P(z) of order M. 

• Decoder. On the decoder side the inverse synthesis transfer function Hs(z) — 

1 (z) — [1 — PQ;)] -1 reconstructs the input signal with the coded difference sam- 
ples and the M filter coefficients. The frequency response of this synthesis filter rep- 
resents the spectral envelope shown in the upper right part of Fig. 9.2. The synthesis 
filter shapes the white spectrum of the difference (prediction error) signal with the 
spectral envelope of the input spectrum. 

The attainable compression rates depend on the statistics of the audio signal and allow a 
compression rate of up to 2 [Bra92, Cel93, Rob94, Cra96, Cra97, Pur97, Han98, HanOl, 
Lie02, Raa02, Sch02], Figure 9.3 illustrates examples of the necessary word-length for 
lossless audio coding [Blo95, Sqa88]. Besides the local entropy of the signal (entropy com- 
puted over a block length of 256), results for linear prediction followed by Huffman coding 
[Huf52] are presented. Huffman coding is carried out with a fixed code table [Pen93] and 
a power-controlled choice of adapted code tables. It is observed from Fig. 9.3 that for high 
signal powers, a reduction in word-length is possible if the choice is made from several 
adapted code tables. Lossless compression methods are used for storage media with limited 
word-length (16 bits in CD and DAT) which are used for recording audio signals of higher 
word-lengths (more than 16 bits). Further applications are in the transmission and archiving 
of audio signals. 
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Figure 9.2 Signals and spectra for linear prediction. 



9.2 Lossy Audio Coding 

Significantly higher compression rates (of factor 4 to 8) can be obtained with lossy coding 
methods. Psychoacoustic phenomena of human hearing are used for signal compression. 
The fields of application have a wide range, from professional audio like source coding for 
DAB to audio transmission via ISDN and home entertainment like DCC and MiniDisc. 

An outline of the coding methods [Bra94] is standardized in an international specifica- 
tion ISO/IEC 1 1172-3 [IS092], which is based on the following processing (see Fig. 9.4): 

• subband decomposition with filter banks of short latency time; 

• calculation of psychoacoustic model parameters based on short-time FFT; 

• dynamic bit allocation due to psychoacoustic model parameters (signal-to-mask ratio 
SMR); 

• quantization and coding of subband signals; 

• multiplex and frame packing. 
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Figure 9.4 Lossy audio coding based on subband coding and psychoacoustic models. 



Owing to lossy audio coding, post-processing of such signals or several coding and 
decoding steps is associated with some additional problems. The high compression rates 
justify the use of lossy audio coding techniques in applications like transmission. 



9.3 Psychoacoustics 

In this section, basic principles of psychoacoustics are presented. The results of psychoa- 
coustic investigations by Zwicker [Zwi82, Zwi90] form the basis for audio coding based on 
models of human perception. These coded audio signals have a significantly reduced data 
rate compared to the linearly quantized PCM representation. The human auditory system 
analyzes broad-band signals in so-called critical bands. The aim of psychoacoustic coding 
of audio signals is to decompose the broad-band audio signal into subbands which are 
matched to the critical bands and then perform quantization and coding of these subband 
signals [Joh88a, Joh88b, Thei88]. Since the perception of sound below the absolute thresh- 
old of hearing is not possible, subband signals below this threshold need neither be coded 
nor transmitted. In addition to the perception in critical bands and the absolute threshold, 
the effects of signal masking in human perception play an important role in signal coding. 
These are explained in the following and their application to psychoacoustic coding is 
discussed. 



9.3.1 Critical Bands and Absolute Threshold 

Critical Bands. Critical bands as investigated by Zwicker are listed in Table 9.1. 
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Table 9.1 Critical bands as given by Zwicker [Zwi82], 



-/Bark 


/,/Hz 


/«/Hz 


f B l Hz 


/c/Hz 
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100 


100 


50 
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100 


200 


100 


150 
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200 


300 


100 


250 
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300 


400 


100 


350 
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400 


510 


110 


450 
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510 


630 


120 


570 
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630 


770 


140 


700 


7 


770 


920 


150 


840 


8 


920 


1080 


160 


1000 
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1080 


1270 


190 


1170 


10 


1270 


1480 


210 


1370 


11 


1480 


1720 


240 


1600 


12 


1720 


2000 


280 


1850 


13 


2000 


2320 


320 


2150 


14 


2320 


2700 


380 


2500 


15 


2700 


3150 


450 


2900 


16 


3150 


3700 


550 


3400 


17 


3700 


4400 


700 


4000 


18 


4400 


5300 


900 


4800 


19 


5300 


6400 


1100 


5800 


20 


6400 


7700 


1300 


7000 


21 


7700 


9500 


1800 


8500 


22 


9500 


1200 


2500 


10500 


23 


12000 


15500 


3500 


13500 


24 


15500 









A transformation of the linear frequency scale into a hearing-adapted scale is given by 
Zwicker [Zwi90] (units of z in Bark): 

, 2 



Bark 



= 13 arctanf 0.76 



kHz 



3.5 arctanf - 



/ 



V 7.5 kHz 



The individual critical bands have bandwidths 

A/b = 25 + 75^1 + 1.4 



kHz 



2x0.69 



(9.1) 



(9.2) 



Absolute Threshold. The absolute threshold Lj q (threshold in quiet) denotes the curve 
of sound pressure level L [Zwi82] versus frequency, which leads to the perception of a 
sinusoidal tone. The absolute threshold is given by [Ter79]: 



L T 

— f = 3.64 
dB 




— 6.5 exp 






(9.3) 



Below the absolute threshold, no perception of signals is possible. Figure 9.5 shows the ab- 
solute threshold versus frequency. Band-splitting in critical bands and the absolute thresh- 
old allow the calculation of an offset between the signal level and the absolute threshold for 
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every critical band. This offset is responsible for choosing appropriate quantization steps 
for each critical band. 




Figure 9.5 Absolute threshold (threshold in quiet). 



9.3.2 Masking 

For audio coding the use of sound perception in critical bands and absolute threshold only is 
not sufficient for high compression rates. The basis for further data reduction are the mask- 
ing effects investigated by Zwicker [Zwi82, Zwi90]. For band-limited noise or a sinusoidal 
signal, frequency-dependent masking thresholds can be given. These thresholds perform 
masking of frequency components if these components are below a masking threshold (see 
Fig. 9.6). The application of masking for perceptual coding is described in the following. 




Figure 9.6 Masking threshold of band-limited noise. 



Calculation of Signal Power in Band i. First, the sound pressure level within a critical 
band is calculated. The short-time spectrum X(k) — DFT[.r(«)] is used to calculate the 
power density spectrum 



S p (e jQ ) = S p (e j(2nk/N) ) = X 2 R {e j(2nk/N) ) + x}(e j{2nklN) ). 

S p (k) = X 2 R (k) + Xj(k), 0 < k < N — 1, 



(9.4) 

(9.5) 
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with the help of an /V-point FFT. The signal power in band i is calculated by the sum 

S p (i)= J2 S P (k) (9 - 6) 

£2=£2li 



from the lower frequency up to the upper frequency of critical band i . The sound pressure 
level in band i is given by L$(i) — 10 log 10 S p (i). 

Absolute Threshold. The absolute threshold is set such that a 4 kHz signal with peak am- 
plitude ±1 LSB for a 16-bit representation lies at the lower limit of the absolute threshold 
curve. Every masking threshold calculated in individual critical bands, which lies below 
the absolute threshold, is set to a value equal to the absolute threshold in the corresponding 
band. Since the absolute threshold within a critical band varies for low and high frequen- 
cies, it is necessary to make use of the mean absolute threshold within a band. 

Masking Threshold. The offset between signal level and the masking threshold in critical 
band i (see Fig. 9.7) is given by [Hel72] 

= a(14.5 + i) + (1 — a)a v , (9.7) 

dB 

where a denotes the tonality index and a v is the masking index. The masking index [Kap92] 
is given by 

a v = —2 — 2.05 arctanf — — — 'j — 0.75 arctanf — ^ . (9.8) 

V4kHz/ \2.56 kHz 2 / 



As an approximation, 

^=o(14.5 + /) + (! -o)5.5 (9.9) 

dB 

can be used [Joh88a, Joh88b], If a tone is masking a noise-like signal (a = 1), the threshold 
is set 14.5 + / dB below the value of L $ (i). If a noise-like signal is masking a tone (a = 0), 
the threshold is set 5.5 + i dB below L$(i). In order to recognize a tonal or noise-like 
signal within a certain number of samples, the spectral flatness measure SFM is estimated. 
The SFM is defined by the ratio of the geometric to arithmetic mean value of S p (i) as 



SFM = 101og 10 



/ [nfii SpV&'W) 



l/(JV/ 2 ) \ 



V 



1V72 UL\ S p (eJV*M) 



) 



The SFM is compared with the SFM of a sinusoidal signal (definition SFM max = 
and the tonality index is calculated [Joh88a, Joh88b] by 



(9.10) 
-60 dB) 



a — min 



SFM 

SFM max 




(9.11) 



SFM = 0 dB corresponds to a noise-like signal and leads to a — 0, whereas an SFM=75 dB 
gives a tone-like signal (a = 1). With the sound pressure level Lfli) and the offset O(i) 
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the masking threshold is given by 



T(i)= 10 [Ls(0_o(0]/1 °. (9.12) 

Masking across Critical Bands. Masking across critical bands can be carried out with the 
help of the Bark scale. The masking threshold is of a triangular form which decreases at 
Si dB per Bark for the lower slope and at Si dB per Bark for the upper slope, depending on 
the sound pressure level L , and the center frequency f Cj in band i (see [Ter79]) according to 

S) = 27 dB/Bark, (9.13) 

S 2 = 24 + 0.23 - 0.2^-^ dB/Bark. (9.14) 

\kHz / dB 

An approximation of the minimum masking within a critical band can be made using 
Fig. 9.8 [Thei88, Sauv90]. Masking at the upper frequency f Ui in the critical band i is 
responsible for masking the quantization noise with approximately 32 dB using the lower 
masking threshold that decreases by 27 dB/Bark. The upper slope has a steepness which 
depends on the sound pressure level. This steepness is lower than the steepness of the lower 
slope. Masking across critical bands is presented in Fig. 9.9. The masking signal in critical 
band / — 1 is responsible for masking the quantization noise in critical band i as well as the 
masking signal in critical band i . This kind of masking across critical bands further reduces 
the number of quantization steps within critical bands. 




Figure 9.7 Offset between signal level and masking threshold. 

An analytical expression for masking across critical bands [Schr79] is given by 

10 log 10 [Z?( A/)] = 15.81 + 7.5(A/ + 0.474) - 17.5[1 + (A; + 0 . 474 ) 2 ] 2 . (9.15) 

A i denotes the distance between two critical bands in Bark. Expression (9.15) is called the 
spreading function. With the help of this spreading function, masking of critical band i by 
critical band j can be calculated [Joh88a, Joh88b] with abs(i — j) < 25 such that 

24 

Sm(i) = - 7) • 

.7=0 



(9.16) 
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Figure 9.8 Masking within a critical band. 




Figure 9.9 Masking across critical bands. 



The masking across critical bands can therefore be expressed as a matrix operation given by 



VO)' 

Sm(l) 




~5( 0) 5(- 1) 5(- 2) ■■ 

B{ 1) 5(0) B(- 1) 


• 5(— 24)' 

• 5(— 23) 




~ V0)' 
S P { 1) 


Sm (24) _ 




_5(24) 5(23) 5(22) • 


■ 5(0) _ 




_Sp( 24)_ 



A renewed calculation of the masking threshold with (9.16) leads to the global masking 
threshold 

T m (i) = 10 logl ° Sm(i )-°( , ')/i0. (9.18) 

For a clarification of the single steps for a psychoacoustic based audio coding we summa- 
rize the operations with exemplified analysis results: 

• calculation of the signal power S p (i) in critical bands -* L$(i) in dB (Fig. 9.10a); 

• calculation of masking across critical bands T m (i) —*■ Lj m ( i ) in dB (Fig. 9.10b); 

• masking with tonality index — ► Lj m (i) in dB (Fig. 9.10c); 

• calculation of global masking threshold with respect to threshold in quiet Lj q -> 
LT m ^(i)m dB (Fig. 9. lOd). 
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With the help of the global masking threshold L j m abs ( i ) we calculate the signal-to-mask 
ratio 

SMR(i) = Ls(i) - L Tmabs (i) in dB (9.19) 

per Bark band. This signal-to-mask ratio defines the necessary number of bits per critical 
band, such that masking of quantization noise is achieved. For the given example the signal 
power and the global masking threshold are shown in Fig. 9.11a. The resulting signal-to- 
mask ratio SMR(i) is shown in Fig. 9.1 lb. As soon as SMR(7) > 0, one has to allocate 
bits to the critical band i. For SMR(i) < 0 the corresponding critical band will not be 
transmitted. Figure 9.12 shows the masking thresholds in critical bands for a sinusoid of 
440 Hz. Compared to the first example, the influence of masking thresholds across critical 
bands is easier to observe and interpret. 



a) Signal power and masking threshold in critical bands 





Figure 9.11 Calculation of the signal-to-mask ratio SMR. 



9.4 ISO-MPEG-1 Audio Coding 

In this section, the coding method for digital audio signals is described which is specified in 
the standard ISO/IEC 1 1 172-3 [IS092], The filter banks used for subband decomposition, 
the psychoacoustic models, dynamic bit allocation and coding are discussed. A simplified 
block diagram of the coder for implementing layers I and II of the standard is shown in 
Fig. 9.13. The corresponding decoder is shown in Fig. 9.14. It uses the information from 
the ISO-MPEG 1 frame and feeds the decoded subband signals to a synthesis filter bank for 
reconstructing the broad-band PCM signal. The complexity of the decoder is, in contrast 
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a) Signal power in critical bands 




b) Masking across critical bands 




c) Masking with tonality index, SFM = 'P90.82 dB, d) Masking with absolute threshold 





Figure 9.12 Calculation of psychoacoustic model for a pure sinusoid with 440 Hz. 




Figure 9.13 Simplified block diagram of an ISO-MPEG 1 coder. 



to the coder, significantly lower. Prospective improvements of the coding method are being 
made entirely for the coder. 



9.4.1 Filter Banks 

The subband decomposition is done with a pseudo-QMF filter bank (see Fig. 9.15). The 
theoretical background is found in the related literature [Rot83, Mas85, Vai93]. The broad- 
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x c 




x (n) 



Figure 9.14 Simplified block diagram of an IS0-MPEG1 decoder. 



band signal is decomposed into M uniformly spaced subbands. The subbands are pro- 
cessed further after a sampling rate reduction by a factor of M. The implementation of an 
ISO-MPEG 1 coder is based on M — 32 frequency bands. The individual band-pass filters 
Hq(z ) • • • Hm-i(z) are designed using a prototype low-pass filter H(z ) and frequency- 
shifted versions. The frequency shifting of the prototype with cutoff frequency n /2M is 
done by modulating the impulse response h(n ) with a cosine term [Bos02] according to 



hk(n ) =h(n) ■ cos^(k + 0,5 )(« - 16) ), 
fk(n) = 32 • h(n) • cos (^(k + 0,5 )(n + 16) ), 



(9.20) 

(9.21) 



with k = 0, . . . , 31 and n — 0, .... 5 I I . The band-pass filters have bandwidth jt/M. For 
the synthesis filter bank, corresponding filters Fo(z) ■ ■ ■ Fm-i(z ) give outputs which are 
added together, resulting in a broad-band PCM signal. The prototype impulse response with 
512 taps, the modulated band-pass impulse responses, and the corresponding magnitude 
responses are shown in Fig. 9.16. The magnitude responses of all 32 band-pass filters are 
also shown. The overlap of neighboring band-pass filters is limited to the lower and upper 
filter band. This overlap reaches up to the center frequency of the neighboring bands. The 
resulting aliasing after downsampling in each subband will be canceled in the synthesis 
filter bank. The pseudo-QMF filter bank can be implemented by the combination of a 
polyphase filter structure followed by a discrete cosine transform [Rot83, Vai93, Kon94], 

To increase the frequency resolution, layer III of the standard decomposes each of the 
32 subbands further into a maximum of 18 uniformly spaced subbands (see Fig. 9.17). The 
decomposition is carried out with the help of an overlapped transform of windowed sub- 
band samples. The method is based on a modified discrete cosine transform, also known as 
the TDAC filter bank (Time Domain Aliasing Cancellation) and MLT (Modulated Lapped 
Transform). An exact description is given in [Pri87, Mal92], This extended filter bank 
is referred to as the polyphase/modified discrete cosine transform (MDCT) hybrid filter 
bank [Bra94], The higher frequency resolution enables a higher coding gain but has the 
disadvantage of a worse time resolution. This is observed for impulse-like signals. In order 
to minimize these artifacts, the number of subbands per subband can be altered from 1 8 
down to 6. Subband decompositions that are matched to the signal can be obtained by 
specially designed window functions with overlapping transforms [Edl89, Edl95]. The 
equivalence of overlapped transforms and filter banks is found in [Mal92, Glu93, Vai93, 
Edl95, Vet95], 
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Figure 9.15 Pseudo-QMF filter bank. 



9.4.2 Psychoacoustic Models 

Two psychoacoustic models have been developed for layers I to III of the ISO-MPEG 1 
standard. Both models can be used independently of each other for all three layers. Psy- 
choacoustic model 1 is used for layers I and II, whereas model 2 is used for layer III. Owing 
to the numerous applications of layers I and II, we discuss psychoacoustic model 1 in this 
subsection. 

Bit allocation in each of the 32 subbands is carried out using the signal-to-mask ratio 
SMR(i). This is based on the minimum masking threshold and the maximum signal level 
within a subband. In order to calculate this ratio, the power density spectrum is estimated 
with the help of a short-time FFT in parallel with the analysis filter bank. As a consequence, 
a higher frequency resolution is obtained for estimating the power density spectrum in 
contrast to the frequency resolution of the 32-band analysis filter bank. The signal-to-mask 
ratio for every subband is determined as follows: 

1. Calculate the power density spectrum of a block of N samples using FFT. After 
windowing a block of N = 512 (N — 1024 for layer II) input samples, the power 
density spectrum 

1 ,V • t 2 

X(k) = 10 logic — h(n)x(n)e- ]nk2jz/N in dB (9.22) 

N n = 0 
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is calculated. Then the window h(n ) is displaced by 384 (12 • 32) samples and the 
next block is processed. 

2. Determine the sound pressure level in every subband. The sound pressure level is 
derived from the calculated power density spectrum and by calculating a scaling 
factor in the corresponding subband as given by 

L s (i) = ma x[X(k), 20 log 10 [SCF max (0 • 32768] - 10] in dB. (9.23) 

For X (k), the maximum of the spectral lines in a subband is used. The scaling factor 
SCF(/) for subband i is calculated from the absolute value of the maximum of 12 
consecutive subband samples. A nonlinear quantization to 64 levels is carried out 
(layer I). For layer II, the sound pressure level is determined by choosing the largest 
of the three scaling factors from 3-12 subband samples. 

3. Consider the absolute threshold. The absolute threshold LT q (m) is specified for 
different sampling rates in [IS092]. The frequency index m is based on a reduction 
of N / 2 relevant frequencies with the FFT of index k (see Fig. 9.18). The subband 
index is still i . 
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Figure 9.18 Nomenclature of frequency indices. 

4. Calculate tonal X tm (k) or non-tonal X nm (k) masking components and determining 
relevant masking components (for details see [IS092]). These masking components 
are denoted by X tm [z(j)\ and X nm [z(j)\. With the index j, tonal and non-tonal 
masking components are labeled. The variable z(m) is listed for reduced frequency 
indices m in [IS092]. It allows a finer resolution of the 24 critical bands with the 
frequency group index z. 

5. Calculate the individual masking thresholds. For masking thresholds of tonal and 
non-tonal masking components X tm [z(j )] and X nm [z(j)\, the following calculation 
is performed: 

LT tm [z(j), z(m )] = X tm [z(j)] + a Vtm [z(j )] + v f [z(j), z(m)] dB, (9.24) 
LT nm [z(j ), z(m)] = X nm [z(j)] + a Vnm [zU )] + v f [z(j), z(m )] dB. (9.25) 

The masking index for tonal masking components is given by 

a Vm = —1.525 — 0.275 • z(j) ~ 4-5 indB, (9.26) 

and the masking index for non-tonal masking components is 

a v„ m = —1-525 — 0.175 • z(j) — 0.5 indB. (9.27) 
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The masking function u/[z(j), z(/«)] with distance Az = z(m) — z{j ) is given by 

3 < Az < — 1 
1 < Az < 0 

0 < Az < 1 

1 < Az < 8 

in Bark. 

This masking function Vf [z( j), z(m)] describes the masking of the frequency index 
z(m) by the masking component z(j). 

6. Calculate the global masking threshold. For frequency index m, the global masking 
threshold is calculated as the sum of all contributing masking components accord- 
ing to 

LTg(m) = 10 log 10 10"V<">/ 10 + 10 L7 ’“" [z0 ' ) ' z(m)]/1 ° 

L 7=1 

Rm 

+ ^2 10 / - 7 '" mlz °' ) ' z(m)l/l ° dB. (9.28) 

7=1 

The total number of tonal and non-tonal masking components are denoted as T m and 
R„, respectively. For a given subband i, only masking components that lie in the 
range —8 to +3 Bark will be considered. Masking components outside this range are 
neglected. 

7. Determine the minimum masking threshold in every subband: 

LT m i n (i) — min [LT g (m)] dB. (9.29) 

Several masking thresholds LT g (m ) can occur in a subband as long as m lies within 
the subband i . 

8. Calculate the signal-to-mask ratio SMR(i) in every subband: 

SMR(i) = L s (i) - LT mm (i) dB. (9.30) 

The signal-to-mask ratio determines the dynamic range that has to be quantized in the 
particular subband so that the level of quantization noise lies below the masking threshold. 
The signal-to-mask ratio is the basis for the bit allocation procedure for quantizing the 
subband signals. 

9.4.3 Dynamic Bit Allocation and Coding 

Dynamic Bit Allocation. Dynamic bit allocation is used to determine the number of bits 
that are necessary for the individual subbands so that a transparent perception is possible. 
The minimum number of bits in subband i can be determined from the difference between 
scaling factor SCF(i) and the absolute threshold LT q (i) as b(i) — SCF(/) — LT q (i). With 



17 • (Az + 1) - (0.4 ■ X[zU)] + 6) 

(0.4 • X[z(j)] + 6) • Az 
-17- Az 

-(Az-l)-(17-0.15-Z[zO)])-17 
in dB 
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this, quantization noise remains under the masking threshold. Masking across critical bands 
is used for the implementation of the ISO-MPEG 1 coding method. 

For a given transmission rate, the maximum possible number of bits B m for coding 
subband signals and scaling factors is calculated as 

32 

B m — b(i) + SCF(z) + additional information. (9.31) 

i = 1 

The bit allocation is performed within an allocation frame consisting of 12 subband samples 
(384 =12-32 PCM samples) for layer I and 36 subband samples (1 152 = 36-32 PCM 
samples) for layer 11. 

The dynamic bit allocation for the subband signals is carried out as an iterative proce- 
dure. At the beginning, the number of bits per subband is set to zero. First, the mask-to- 
noise ratio 

MNR(z') = SNR(z) - SMR(/) (9.32) 

is determined for every subband. The signal-to-mask ratio SMR(z) is the result of the 
psychoacoustic model. The signal-to-noise ratio SNR(i) is defined by a table in [IS092], 
in which for every number of bits a corresponding signal-to-noise ratio is specified. The 
number of bits must be increased as long as the mask-to-noise ratio MNR is less than zero. 

The iterative bit allocation is performed by the following steps. 

1 . Determination of the minimum MNR(z ) of all subbands. 

2. Increasing the number of bits of these subbands on to the next stage of the MPEG1 
standard. Allocation of 6 bits for the scaling factor of the MPEG1 standard when the 
number of bits is increased for the first time. 

3. New calculation of MNR(z) in this subband. 

4. Calculation of the number of bits for all subbands and scaling factors and comparison 
with the maximum number B m . If the number of bits is smaller than the maximum 
number, the iteration starts again with step 1 . 

Quantization and Coding of Subband Signals. The quantization of the subband signals 
is done with the allocated bits for the corresponding subband. The 12 (36) subband samples 
are divided by the corresponding scaling factor and then linearly quantized and coded (for 
details see [IS092]). This is followed by a frame packing. In the decoder, the procedure is 
reversed. The decoded subband signals with different word-lengths are reconstructed into a 
broad-band PCM signal with a synthesis filter bank (see Fig. 9.14). MPEG-1 audio coding 
has a one- or a two-channel stereo mode with sampling frequencies of 32, 44. 1 , and 48 kHz 
and a bit rate of 128 kbit/s per channel. 



9.5 MPEG-2 Audio Coding 

The aim of the introduction of MPEG-2 audio coding was the extension of MPEG-1 to 
lower sampling frequencies and multichannel coding [Bos97]. Backward compatibility 
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to existing MPEG-1 systems is achieved through the version MPEG-2 BC (Backward 
Compatible) and the introduction toward lower sampling frequencies of 32, 22.05, 24 kHz 
with version MPEG-2 LSF (Lower Sampling Frequencies). The bit rate for a five-channel 
MPEG-2 BC coding with full bandwidth of all channels is 640-896 kBit/s. 



9.6 MPEG-2 Advanced Audio Coding 

To improve the coding of mono, stereo, and multichannel audio signals the MPEG-2 AAC 
(Advanced Audio Coding) standard was specified. This coding standard is not backward 
compatible with the MPEG-1 standard and forms the kernel for new extended coding 
standards such as MPEG-4. The achievable bit rate for a five-channel coding is 320 kbit/s. 
In the following the main signal processing steps for MPEG-2 AAC are introduced and 
the principle functionalities explained. An extensive explanation can be found in [Bos97, 
Bra98, Bos02]. The MPEG-2 AAC coder is shown in Fig. 9.19. The corresponding decoder 
performs the functional units in reverse order with corresponding decoder functionali- 
ties. 

Pre-processing. The input signal will be band-limited according to the sampling frequency. 
This step is used only in the scalable sampling rate profile [Bos97, Bra98, Bos02]. 

Filter bank. The time-frequency decomposition into M = 1024 subbands with an over- 
lapped MDCT [Pri86, Pri87] is based on blocks of N = 2048 input samples. A stepwise 
explanation of the implementation is given. A graphical representation of the single steps 
is shown in Fig. 9.20. The single steps are as follows: 

1. Partitioning of the input signal x(n) with time index n into overlapped blocks 

x m (r) — x(mM + r), r = 0, . . . , N — 1; — oo < m < oo, (9.33) 

of length N with an overlap (hop size) of M = N / 2. The time index inside a block 
is denoted by r . The variable m denotes the block index. 

2. Windowing of blocks with window function w(r) -> x m (r) ■ w(r). 

3. The MDCT 

V (m, k) = ^ x m (r)w(r) cos^ (k + 0 (V + , 

k = 0, . . . , M- 1, (9.34) 

yields, for every M input samples, M — A/2 spectral coefficients from N windowed 
input samples. 

4. Quantization of spectral coefficients X (m , k) leads to quantized spectral coefficients 
X q ( m , k) based on a psychoacoustic model. 
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Figure 9.19 MPEG-2 AAC coder and decoder. 




yields, for every M input samples, N output samples in block x m (r). 

6. Windowing of inverse transformed block x m (r ) with window function w (r). 
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Figure 9.20 Time-frequency decomposition with MDCT/IMDCT. 



7. Reconstruction of output signal y(n ) by overlap-add operation according to 

OO 

y(n)= ^2 x m (r)w(r), r = 0,...,N-l, (9.36) 

m=—o o 



with overlap M. 

In order to explain the procedural steps we consider the MDCT/IMDCT of a sine pulse 
shown in Fig. 9.21. The left column shows from the top down the input signal and partitions 
of the input signal of block length N — 256. The window function is a sine window. The 
corresponding MDCT coefficients of length M — 128 are shown in the middle column. 
The IMDCT delivers the signals in the right column. One can observe that the inverse 
transforms with the IMDCT do not exactly reconstruct the single input blocks. Moreover, 
each output block consists of an input block and a special superposition of a time-reversed 
and by M — N /2 circular shifted input block, which is denoted by time-domain aliasing 
[Pri86, Pri87, Edl89]. The overlap-add operation of the single output blocks perfectly 
recovers the input signal which is shown in the top signal of the right column (Fig. 9.21). 
For a perfect reconstruction of the output signal, the window function of the analysis 
and synthesis step has to fulfill the condition w 2 (r) + w 2 (r + M) = 1, r = 0, . . . , M — 1. 
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Figure 9.21 Signals of MDCT/IMDCT. 



The Kaiser-Bessel derived window [Bos02] and a sine window h(n) = sin((n + \)jt) with 
n = 0, . . . , N — 1 [Mal92] are applied. Figure 9.22 shows both window functions with 
N — 2048 and the corresponding magnitude responses for a sampling frequency of fs = 
44100 Hz. The sine window has a smaller pass-band width but slower falling side lobes. 
In contrast, the Kaiser-Bessel derived window shows a wider pass-band and a faster decay 
of the side lobes. In order to demonstrate the filter bank properties and in particular the 
frequency decomposition of MDCT, we derive the modulated band-pass impulse responses 
of the window functions (prototype impulse response w(n ) = h(n) ) according to 

hk(n ) = 2 ■ h(n) ■ cos (j^(k + 0 (n + M ^ j j, 

k = 0, . . . , M - 1; n = 0, . . . , N - 1. (9.37) 

Figure 9.23 shows the normalized prototype impulse response of the sine window and 
the first two modulated band-pass impulse responses ho(n) and h i (n) and accordingly the 
corresponding magnitude responses are depicted. Besides the increased frequency resolu- 
tion with M = 1024 band-pass filters, the reduced stop-band attenuation can be observed. 
A comparison of this magnitude response of the MDCT with the frequency resolution of 
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Window functions with N=2048 




Figure 9.22 Kaiser-Bessel derived window and sine window for N — 2048 and magnitude responses 
of the normalized window functions. 



the PQMF filter bank with M — 32 in Fig. 9.16 points out the different properties of both 
subband decompositions. 

For adjusting the time and frequency resolution to the properties of an audio signal 
several methods have been investigated. Signal-adaptive audio coding based on the wavelet 
transform can be found in [Sin93, ErnOO]. Window switching can be applied for achiev- 
ing a time-variant time-frequency resolution for MDCT and IMDCT applications. For 
stationary signals a high frequency resolution and a low time resolution are necessary. 
This leads to long windows with N — 2048. Coding of attacks of instruments needs a 
high time resolution (reduction of window length to IV = 256) and thus reduces frequency 
resolution (reduction of number of spectral coefficients). A detailed description of switch- 
ing between time-frequency resolution with the MDCT/IMDCT can be found in [Edl89, 
Bos97, Bos02]. Examples of switching between different window functions and windows 
of different length are shown in Fig. 9.24. 



Temporal Noise Shaping. A further method for adapting the time-frequency resolution 
of a filter bank and here an MDCT/IMDCT to the signal characteristic is based on linear 
prediction along the spectral coefficients in the frequency domain [Her96, Her99]. This 
method is called temporal noise shaping (TNS) and is a weighting of the temporal envelope 
of the time-domain signal. Weighting the temporal envelope in this way is demonstrated in 
Fig. 9.25. 
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Figure 9.23 Normalized impulse responses of sine window for N = 2048. modulated band-pass 
impulse responses, and magnitude responses. 




Figure 9.24 Switching of window functions. 



Figure 9.25a shows a signal from a castanet attack. Making use of the discrete cosine 
transform (DCT, [Rao90]) 

X ca Hk) = ]T x(n) cos( ^ ), k — 0 N- 1 (9.38) 

n = 0 ' ' 





Figure 9.25 Attack of 



and the inverse discrete cosine transform (IDCT 



x(n) = J c kX C(2) (k ) cos(< 
' " i~n \ 
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1/V(2), k = 0, 
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the spectral coefficients of the DCT of this castanet attack are represented in Fig. 9.25b. 
After quantization of these spectral coefficients X (k) to 4 bits (Fig. 9.25d) and IDCT of 
the quantized spectral coefficients, the time-domain signal in Fig. 9.25c and the difference 
signal in Fig. 9.25e between input and output result. One can observe in the output and 
difference signal that the error is spread along the entire block length. This means that 
before the attack of the castanet happens, the error signal of the block is perceptible. 
The time-domain masking, referred to as pre-masking [Zwi90], is not sufficient. Ideally, 
the spreading of the error signal should follow the time-domain envelope of the signal 
itself. From forward linear prediction in the time domain it is known that the power spec- 
tral density of the error signal after coding and decoding is weighted by the envelope 
of the power spectral density of the input signal [Var06]. Performing a forward linear 
prediction along the frequency axis in the frequency domain and quantization and coding 
leads to an error signal in the time domain where the temporal envelope of the error 
signal follows the time-domain envelope of the input signal [Her96], To point out the 
temporal weighting of the error signal we consider the forward prediction in the time 
domain in Fig. 9.26a. For coding the input signal ,x(n) is predicted by an impulse response 
p(n). The output of the predictor is subtracted from the input signal x(n) and delivers 
the signal d(n), which is then quantized to a reduced word-length. The quantized signal 
dQ(n) = x(n) * a(n ) + e(n) is the sum of the convolution of x(n ) with the impulse re- 
sponse a(n ) and the additive quantization error e(n). The power spectral density of the 
coder output is Sd q d q — Sxx (e- , ' n ) • | A(e- /Q )| 2 + See(z /Q )- The decoding operation 
performs the convolution of clQ(n) with the impulse response h(n ) of the inverse system to 
the coder. Therefore a{n) * h(n) — S(n ) must hold and thus H{e^) — 1 / A(e^). Hereby 
the output signal v(n) = x(n) + e(n ) * h(n ) is derived with the corresponding discrete 
Fourier transform Y (k) — X(k) + E(k ) • H(k). The power spectral density of the decoder 
out signal is given by Syy(£^) — Sxx(e^) + See(^ Q ) ■ |//(c- ,n )| 2 . Here one can observe 
the spectral weighting of the quantization error with the spectral envelope of the input 
signal which is represented by \H(e-i Q )\. The same kind of forward prediction will now 
be applied in the frequency domain to the spectral coefficients X(k) = DCT[x(n)] for a 
block of input samples xin) shown in Fig. 9.26b. The output of the decoder is then given 
by Y (k) = X(k) + E(k) * H(k ) with A(k ) * H(k) = 8(k). Thus, the corresponding time- 
domain signal is y(n) — x(n) + e(n) ■ h(n), where the temporal weighting of the quantiza- 
tion error with the temporal envelope of the input signal is clearly evident. The temporal 
envelope is represented by the absolute value \h(n)\ of the impulse response h(n). The 
relation between the temporal signal envelope (absolute value of the analytical signal) 
and the autocorrelation function of the analytical spectrum is discussed in [Her96], The 
dualities between forward linear prediction in time and frequency domain are summarized 
in Table 9.2. Figure 9.27 demonstrates the operations for temporal noise shaping in the 
coder, where the prediction is performed along the spectral coefficients. The coefficients of 
the forward predictor have to be transmitted to the decoder, where the inverse filtering is 
performed along the spectral coefficients. 

The temporal weighting is finally demonstrated in Fig. 9.28, where the corresponding 
signals with forward prediction in the frequency domain are shown. Figure 9.28a,b shows 
the castanet signal x(n) and its corresponding spectral coefficients X(k) of the applied 
DCT. The forward prediction delivers D(k) in Fig. 9.28d and the quantized signal DQ(k) 
in Fig. 9.28f. After the decoder the signal Y (k) in Fig. 9.28h is reconstructed by the inverse 




300 



Audio Coding 



a) Forward prediction in time domain ► Error signal e(n) after decoder is spectrally weighted 

with spectral envelope H(k)=DFT[h(n)] 




Coder Decoder 



b) Forward prediction in frequency domain ► Error signal e(n) after decoder is temporally weighted 

with temporal envelope h(n)=IDFT[H(k)] 




Coder Decoder 



Figure 9.26 Forward prediction in time and frequency domain. 




Figure 9.27 Temporal noise shaping with forward prediction in frequency domain. 

transfer function. The IDCT of Y (k) finally results in the output signal yin) in Fig. 9.28e. 
The difference signal x (n) — y(n) in Fig. 9.28g demonstrates the temporal weighting of the 
error signal with the temporal envelope from Fig. 9.28c. For this example, the order of the 
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Table 9.2 Forward prediction in time and frequency domain. 



Prediction in time domain Prediction in frequency domain 

y(n) = x(n) + e(n) * h(n) y{n) = x(n) + e(n) ■ h(n) 

Y(k) = X(k) + E(k) ■ H(k ) Y(k) = X(k) + E(k) * H(k) 



predictor is chosen as 20 [Bos97] and the prediction along the spectral coefficients X (k) is 
performed by the Burg method. The prediction gain for this signal in the frequency domain 
is G p = 16 dB (see Fig. 9.28d). 

Frequency-domain Prediction. A further compression of the band-pass signals is possible 
by using linear prediction. A backward prediction [Var06] of the band-pass signals is 
applied on the coder side (see Fig. 9.29). In using a backward prediction the predictor 
coefficients need not be coded and transmitted to the decoder, since the estimate of the input 
sample is based on the quantized signal. The decoder derives the predictor coefficients p(n ) 
in the same way from the quantized input. A second-order predictor is sufficient, because 
the bandwidth of the band-pass signals is very low [Bos97], 





-O 




Figure 9.29 Backward prediction of band-pass signals. 



Mono/Side Coding. Coding of stereo signals with left and right signals xl(vi) and xr(h) 
can be achieved by coding a mono signal (M) x\i(n) = (xiAn) + xr(h))/2 and a side (S, 
difference) signal xs(n) — (xi(n) — xr(h))/2 (M/S coding). Since for highly correlated 
left and right signals the power of the side signal is reduced, a reduction in bit rate for this 
signal can be achieved. The decoder can reconstruct the left signal xiAn) — xm(k) + xs(n) 
and the right signal xr(h) = xmA 1 ) — xs(n), if no quantization and coding is applied to the 
mono and side signal. This M/S coding is carried out for MPEG-2 AAC [Bra98, Bos02] 
with the spectral coefficients of a stereo signal (see Fig. 9.30). 
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Figure 9.30 M/S coding in frequency domain. 



Intensity Stereo Coding. For intensity stereo (IS) coding a mono signal xm(w) = x/.(n) + 
xr ( ii ) and two temporal envelopes ez,(n) and e«(n) of the left and right signals are coded 
and transmitted. On the decoding side the left signal is reconstructed by yiJn) = xm ( u ) ■ 
eL(n) and the right signal by v«(n) = xm(«) • eR(n). This reconstruction is lossy. The IS 
coding of MPEG-2 AAC [Bra98] is performed by summation of spectral coefficients of 
both signals and by coding of scale factors which represent the temporal envelope of both 
signals (see Fig. 9.31). This type of stereo coding is only useful for higher frequency bands, 
since the human perception for phase shifts is non-sensitive for frequencies above 2 k/Hz. 




E L (k) 




IS Decoder 

Figure 9.31 Intensity stereo coding in frequency domain. 



Quantization and Coding. During the last coding step the quantization and coding of the 
spectral coefficients takes place. The quantizers, which are used in the figures for prediction 
along spectral coefficients in frequency direction (Fig. 9.27) and prediction in the frequency 
domain along band-pass signals (Fig. 9.29), are now combined into a single quantizer per 
spectral coefficient. This quantizer performs nonlinear quantization similar to a floating- 
point quantizer of Chapter 2 such that a nearly constant signal-to-noise ratio over a wide 
amplitude range is achieved. This floating-point quantization with a so-called scale factor 
is applied to several frequency bands, in which several spectral coefficients use a common 
scale factor derived from an iteration loop (see Fig. 9.19). Finally, a Huffman coding of 
the quantized spectral coefficients is performed. An extensive presentation can be found in 
[Bos97, Bra98, Bos02], 
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9.7 MPEG-4 Audio Coding 

The MPEG-4 audio coding standard consists of a family of audio and speech coding 
methods for different bit rates and a variety of multimedia applications [Bos02, Her02], 
Besides a higher coding efficiency, new functionalities such as scalability, object-oriented 
representation of signals and interactive synthesis of signals at the decoder are integrated. 
The MPEG-4 coding standard is based on the following speech and audio coders. 

• Speech coders 

- CELP: Code Excitated Linear Prediction (bit rate 4-24 kbit/s). 

- HVXC: Harmonic Vector Excitation Coding (bit rate 1.4-4 kbit/s). 

• Audio coders 

- Parametric audio: representation of a signal as a sum of sinsoids, harmonic 
components, and residual components (bit rate 4—16 kbit/s). 

- Structured audio: synthetic signal generation at decoder (extension of the MIDI 
standard 1 ) (200 bit-4 kbit/s). 

- Generalized audio: extension of MPEG-2 AAC with additional methods in the 
time-frequency domain. The basic structure is depicted in Fig. 9.19 (bit rate 
6-64 kbit/s). 

Basics of speech coders can found in [Var06]. The specified audio coders allow coding with 
lower bit rates (Parametric Audio and Structured Audio) and coding with higher quality at 
lower bit rates compared to MPEG-2 AAC. 

Compared to coding methods such as MPEG-1 and MPEG-2 introduced in previous 
sections, the parametric audio coding is of special interest as an extension to the filter bank 
methods [Pur99, EdlOO]. A parametric audio coder is shown in Fig. 9.32. The analysis 
of the audio signal leads to a decomposition into sinusoidal, harmonic and noise-like sig- 
nal components and the quantization and coding of these signal components is based on 
psychoacoustics [Pur02a]. According to an analysis/synthesis approach [McA86, Ser89, 
Smi90, Geo92, Geo97, Rod97, MarOOa] shown in Fig. 9.33 the audio signal is represented 
in a parametric form given by 

( fi ( n ) \ 

x(n ) — 2_ Aj(n) cos I 2: r- n + <Pi(n) I + x„(n). (9.40) 

iZ t V /a ) 

The first term describes a sum of sinusoids with time-varying amplitudes A, (n), fre- 
quencies fi(ri) and phases <Pi(n). The second term consists of a noise-like component 
x„ ( n ) with time-varying temporal envelope. This noise-like component x„ in) is derived by 
subtracting the synthesized sinusoidal components from the input signal. With the help of 
a further analysis step, harmonic components with a fundamental frequency and multiples 
of this fundamental frequency are identified and grouped into harmonic components. The 
extraction of deterministic and stochastic components from an audio signal can be found in 

1 http://www.midi.org/. 
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Figure 9.32 MPEG-4 parametric coder. 



Ai, fi, cpi 




Figure 9.33 Parameter extraction with analysis/synthesis. 



[Alt99, Hai03, KeiOl, Kei02, MarOOa, MarOOb, Lag02, Lev98, Lev99, Pur02b], In addition 
to the extraction of sinusoidal components, the modeling of noise-like components and 
transient components is of specific importance [Lev98, Lev99], Figure 9.34 exemplifies 
the decomposition of an audio signal into a sum of sinusoids x s (n) and a noise-like sig- 
nal x„(n). The spectrogram shown in Fig. 9.35 represents the short-time spectra of the 
sinusoidal components. The extraction of the sinusoids has been achieved by a modified 
FFT method [MarOOa] with an FFT length of N = 2048 and an analysis hop size of R a = 
512. 

The corresponding parametric MPEG-4 decoder is shown in Fig. 9.36 [EdlOO, Mei02]. 
The synthesis of the three signal components can be achieved by inverse FFT and overlap- 
add methods or can be directly performed by time-domain methods [Rod97, Mei02]. A 
significant advantage of parametric audio coding is the direct access at the decoder to the 
three main signal components which allows effective post-processing for the generation of 
a variety of audio effects [Z6102]. Effects such as time and pitch scaling, virtual sources in 
three-dimensional spaces and cross-synthesis of signals (karaoke) are just a few examples 
of interactive sound design on the decoding side. 
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Figure 9.34 Original signal, sum of sinusoids and noise-like signal. 




9.8 Spectral Band Replication 



To further reduce the bit rate an extension of MPEG-1 Layer III with the name MP3pro 
was introduced [Die02, Zie02]. The underlying method, called spectral band replication 
(SBR), performs a low-pass and high-pass decomposition of the audio signal, where the 
low -pass filtered part is coded by a standard coding method (e.g. MPEG-1 Layer III) and 
the high-pass part is represented by a spectral envelope and a difference signal [Eks02, 
Zie03], Figure 9.37 shows the functional units of an SBR coder. For the analysis of the 
difference signal the high-pass part (HP Generator) is reconstructed from the low-pass part 
and compared to the actual high-pass part. The difference is coded and transmitted. For 
decoding (see Fig. 9.38) the decoded low-pass part of a standard decoder is used by the 
HP generator to reconstruct the high-pass part. The additional coded difference signal is 
added at the decoder. An equalizer provides the spectral envelope shaping for the high-pass 
part. The spectral envelope of the high-pass signal can be achieved by a filter bank and 
computing the RMS values of each band-pass signal [Eks02, Zie03]. The reconstruction 
of the high-pass part (HP Generator) can also be achieved by a filter bank and substituting 
the band-pass signals by using the low-pass parts [Schu96, Her98]. To code the difference 
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Figure 9.37 SBR coder. 




Figure 9.38 SBR decoder. 



signal of the high-pass part additive sinusoidal models can be applied such as the parametric 
methods of the MPEG-4 coding approach. 

Figure 9.39 shows the functional units of the SBR method in the frequency domain. 
First, the short-time spectrum is used to calculate the spectral envelope (Fig. 9.39a). The 
spectral envelope can be derived from an FFT, a filter bank, the cepstrum or by linear 
prediction [Zol02]. The band-limited low-pass signal can be downsampled and coded by a 
standard coder which operates at a reduced sampling rate. In addition, the spectral envelope 
has to be coded (Fig. 9.39b). On the decoding side the reconstruction of the upper spectrum 
is achieved by frequency-shifting of the low-pass part or even specific low-pass parts and 
applying the spectral envelope onto this artificial high-pass spectrum (Fig. 9.39c). An 
efficient implementation of a time-varying spectral envelope computation (at the coder 
side) and spectral weighting of the high-pass signal (at the decoder side) with a complex- 
valued QMF filter bank is described in [Eks02]. 

9.9 Java Applet - Psychoacoustics 

The applet shown in Fig. 9.40 demonstrates psychoacoustic audio masking effects [Gui05]. 
It is designed for a first insight into the perceptual experience of masking a sinusoidal signal 
with band-limited noise. 

You can choose between two predefined audio files from our web server ( audiol.wav 
or audio2.wav). These are band-limited noise signals with different frequency ranges. A 
sinusoidal signal is generated by the applet, and two sliders can be used to control its 
frequency and magnitude values. 














|X(f)|/dB |X(f)|/dB |X(f)|/dB 



9. 9 Java Applet - Psychoacoustics 



309 



a) Short-time spectrum and spectral envelope 




b) Short-time spectrum for audio coding and spectral envelope 




c) Reconstruction of upper short-time spectrum with spectral envelope 




Figure 9.39 Functional units of SBR method. 
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Figure 9.40 Java applet - psychoacoustics. 



9.10 Exercises 

1. Psychoacoustics 

1 . Human hearing 

(a) What is the frequency range of human sound perception? 

(b) What is the frequency range of speech? 

(c) In the above specified range where is the human hearing most sensitive? 

(d) Explain how the absolute threshold of hearing has been obtained. 

2. Masking 

(a) What is frequency-domain masking? 

(b) What is a critical band and why is it needed for frequency masking phenomena? 

(c) Consider a,- and ft to be respectively the amplitude and the frequency of a 
partial at index i and V ( a ,■) to be the corresponding volume in dB. The differ- 
ence between the level of the masker and the masking threshold is — 10 dB. The 
masking curves toward lower and higher frequencies are described respectively 
by a left slope (27 dB/Bark) and a right slope (15 dB/Bark). Explain the main 
steps of frequency masking in this case and show with plots how this masking 
phenomena is achieved. 

(d) What are the psychoacoustic parameters used for lossy audio coding? 

(e) How can we explain the temporal masking and what is its duration after stop- 
ping the active masker? 
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2. Audio coding 

1 . Explain the lossless coder and decoder. 

2. What is the achievable compression factor for lossless coding? 

3. Explain the MPEG-1 Layer III coder and decoder. 

4. Explain the MPEG-2 AAC coder and decoder. 

5. What is temporal noise shaping? 

6. Explain the MPEG-4 coder and decoder. 

7. What is the benefit of SBR? 
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nonrecursive, 77 
octave, 115 
one-third octave, 116 
peak, 117, 124, 132, 135, 136, 

181 

shelving, 117, 121, 130, 133-136, 
181 

Filter bank, 5, 285 
analysis, 4, 213 
multi-complementary, 168 
octave-band, 168 
synthesis, 4, 214 
Filter structures 

coefficient quantization, 138 
limit cycles, 157 
noise behavior of recursive, 143 
noise shaping, 150 
nonrecursive, 161, 168 
parametric, 128 
recursive, 138 
scaling, 154 
Frequency density, 202 
Frequency sampling, 167 

Gain error, 80 

Hard disc recording, 1 
Huffmann coding, 274 

IDFT, 158 
Image model, 192 
Integral nonlinearity, 80 
Internet audio, 9 

Interpolation, 66, 75, 169, 174, 175, 
246, 247, 254 
Fagrange, 260 
polynomial, 257 
spline, 261 
ISO-MPEG 1,4, 284 
coder, 284 
decoder, 286 

Java applet, 59, 93, 182, 218, 238, 310 
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Latency time, 78, 178, 275 
Limit cycles, 157 
Limiter, 226, 23 1 

MADI interface, 1, 102 
Masking, 279,281, 284 
index, 289 

threshold, 279, 281, 282, 284, 289 
global, 290 
MiniDisc, 10 
Mixing console, 1 
Monotonicity, 80 
MP3, 9, 10 
MPEG-2, 291 
MPEG-2 AAC, 292 
MPEG-4, 8, 304 

Noise 

gate, 225, 226, 231 
shaping, 14, 42,47, 144, 150 
Number representation, 47 
fixed-point, 47 
floating-point, 53 
format conversion, 56 
Nyquist sampling, 63 

OFDM transmission, 7 
guard interval, 7 
Offset error, 80 
Oversampling, 14, 65 

signal-to-noise ratio, 66 

Peak factor, 23 

Peak measurement, 118, 228, 229 
Polyphase representation, 245 
Prediction, 274 

Pseudo-random sequence, 193,213 
Psychoacoustic models, 287 
Pulse width modulation, 14 

Q-factor, 120, 121 
Quantization error 

correlation with signal, 34 
first-order statistics, 30 



noise shaping, 42 
power, 65 

probability density function, 22 
second-order statistics, 32 
Quantization model, 21 
Quantization noise, 284 
Quantization step, 22, 23, 65, 247, 249 
Quantization theorem, 21, 24 

R-DAT, 10 

Ray tracing model, 192 
Real-time operating system, 97 
Release time, 23 1 
Resolution, 79 

Reverberation time, 191, 195,205 
frequency-dependent, 207 
Room impulse response, 191 
approximation, 213 
measurement of, 193 
Room simulation, 14, 191 
Root mean square measurement, 118, 
229 

Sample-and-hold, 79 
circuit, 63 
function, 247 
Sampling period, 63 
Sampling rate, 2, 63 
Sampling rate conversion, 241 
asynchronous, 246 
multistage, 252 
single-stage, 250 
synchronous, 244 
Sampling theorem, 63 
Scale factor, 303 
Scaling factor, 289 
Settling time, 87 
Signal processor 
fixed-point, 98 
floating-point, 100 
multi-processor systems, 109 
single-processor systems, 107 
Signal quantization, 21 
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Signal-to-mask ratio, 284, 287, 290, 
291 

Signal-to-noise ratio, 23, 53, 55, 57 
Sine distortion, 65 
Sound channel, 4 
Sound studio, 1 
SPDIF interface, 102 
Spectral Band Replication SBR, 8 
Spectral band replication SBR, 306 
Spreading function, 28 1 
Static curve, 226 
Storage media 
DVD- A, 13 
MiniDisc, 10 
SACD, 1 1 

Studio technology, 1 
Subsequent reverberation, 191, 200 
Super Audio CD (SACD), 1 1 



Surround systems, 14 

Threshold of hearing, 46, 278, 280, 
289 

Time constants, 230 
Tonality index, 280, 282 
Total harmonic distortion, 8 1 
Transmission techniques 
DRM, 8 

internet audio, 9 

Upsampling, 241 

Weighting filter 
A-, 118 

CCIR-468, 118 
Word-length, 23, 138 




